Redis源码阅读之渐进式rehash

June 22, 2024

Redis版本：7.0.2

在哈希表扩容或缩容时，如果一次迁移完所有数据，可能会导致性能阻塞问题。为了避免出现这种情况，Redis采用了渐进式 rehash的方式，将rehash操作分为多次进行，以确保系统的稳定性。在rehash过程中，redis会逐步将旧哈希表中的数据迁移到新哈希表中，直到旧哈希表中的所有数据都迁移完毕。

实际上，不是只有hash类型会用到dict，list、set、zset等类型也会用到dict。而每个database中也会有一个全局字典和过期字典。

我们今天就来阅读一下redis中渐进式rehash的实现。

dict-Redis中的字典类型定义 #

在redis中，字典（dict）类型的定义如下：

/*****************dict.h********************/
struct dict {
    dictType *type;

    dictEntry **ht_table[2];
    unsigned long ht_used[2];

    long rehashidx; /* rehashing not in progress if rehashidx == -1 */

    /* Keep small vars at end for optimal (minimal) struct padding */
    int16_t pauserehash; /* If >0 rehashing is paused (<0 indicates coding error) */
    signed char ht_size_exp[2]; /* exponent of size. (size = 1<<exp) */
};

其中字段介绍如下：

type：字典类型
ht_table：哈希表数组，ht_table[0]为旧哈希表，ht_table[1]为新哈希表。添加喜暖苏时，如果正在rehash，会写入ht_table[1]，否则会写入ht_table[1]。
ht_used：哈希表已使用的槽位数
rehashidx：当前正在迁移的槽位索引，当rehashidx为-1时，表示没有rehash操作正在进行红。。
pauserehash：是否暂停rehash操作。
ht_size_exp：哈希表大小的指数值。

dictRehash #

/*******************************dict.c*********************************/
/* Performs N steps of incremental rehashing. Returns 1 if there are still
 * keys to move from the old to the new hash table, otherwise 0 is returned.
 * 执行N次增量rehash。如果在旧hash表中仍然有需要移动的键，返回1，否则返回0.
 *
 * Note that a rehashing step consists in moving a bucket (that may have more
 * than one key as we use chaining) from the old to the new hash table, however
 * since part of the hash table may be composed of empty spaces, it is not
 * guaranteed that this function will rehash even a single bucket, since it
 * will visit at max N*10 empty buckets in total, otherwise the amount of
 * work it does would be unbound and the function may block for a long time. 
 * 请注意，rehashing的一步操作将一个bucket（应为我们用了链表，所以实际上可能会有多个key）
 * 从旧的哈希表移动到新的哈希表。然而，由于哈希表的部分可能是空白空间，因此不能保证这个函数
 * 一定会rehash一个桶。它最多会访问N*10个空桶，否则它的工作量将是无限的，函数可能会阻塞很长时间。
 * */
int dictRehash(dict *d, int n)
{
    int empty_visits = n * 10; /* Max number of empty buckets to visit. */
    if (!dictIsRehashing(d))
        return 0;

    while (n-- && d->ht_used[0] != 0)
    {
        dictEntry *de, *nextde;

        /* Note that rehashidx can't overflow as we are sure there are more
         * elements because ht[0].used != 0 */
        assert(DICTHT_SIZE(d->ht_size_exp[0]) > (unsigned long)d->rehashidx);
        while (d->ht_table[0][d->rehashidx] == NULL)
        {
            d->rehashidx++;
            if (--empty_visits == 0)
                return 1;
        }
        de = d->ht_table[0][d->rehashidx];
        /* Move all the keys in this bucket from the old to the new hash HT */
        while (de)
        {
            uint64_t h;

            nextde = de->next;
            /* Get the index in the new hash table */
            h = dictHashKey(d, de->key) & DICTHT_SIZE_MASK(d->ht_size_exp[1]);
            de->next = d->ht_table[1][h];
            d->ht_table[1][h] = de;
            d->ht_used[0]--;
            d->ht_used[1]++;
            de = nextde;
        }
        d->ht_table[0][d->rehashidx] = NULL;
        d->rehashidx++;
    }

    /* Check if we already rehashed the whole table... */
    if (d->ht_used[0] == 0)
    {
        zfree(d->ht_table[0]);
        /* Copy the new ht onto the old one */
        d->ht_table[0] = d->ht_table[1];
        d->ht_used[0] = d->ht_used[1];
        d->ht_size_exp[0] = d->ht_size_exp[1];
        _dictReset(d, 1);
        d->rehashidx = -1;
        return 0;
    }

    /* More to rehash... */
    return 1;
}

逻辑为：

校验rehash状态。
遍历旧hash表ht_table[0]，从下标rehashidx开始，找到第一个非空桶de。

此处请注意，有一个整型变量empty_visits = n*10，用于限制最大访问空桶的次数，如果遍历ht_table[0]十次还没有找到非空桶，则会直接中断短讯，返回1，避免rehash操作阻塞太久。

遍历de链表，将每个元素移动到新hash表ht_table[1]中。
如果旧hash表ht_table[0]中的桶移动全部移动到了新桶中，则释放旧hash表内存，将新hash表赋值给旧hash表，重置rehash状态，返回0。
如果还有未迁移的桶，则返回1。

小结 #

字典dict的哈希表(ht_table)是一个二维数组，我们可以认为ht_table[0]为旧哈希表，ht_table[1]为新哈希表。
每次添加元素时，如果正在进行rehash，添加到ht_table[1]中，否则，添加到ht_table[0]中。
Redis的哈希表用的是拉链法，即数组+链表的方式，每个槽位上的元素是一个链表。
每次迁移时，实际上不是只迁移一个key，而是将这个桶（或者说槽）的链表上的所有key都迁移到新哈希表中。
为了避免rehash操作阻塞太久，每次rehash操作最多只会访问10个空桶。