一致性哈希Java实现

在缓存服务的负载均衡领域,会遇到一种问题:考虑到我们可以通过哈希算法实现用户结点到服务结点的常数时间映射,但是如果服务器遇到上下线的问题,会使得原本的映射面临大范围的失效。比如用户A指向5号服务器,某个时刻5号服务器突然宕机,其他的服务器则需要替补5号服务的位置。此时我们使用的是直接映射方案,因此除了5号服务器,也包括替补服务器(可能是6号替补5号,这时原本6号的位置需要7号替补,以此类推)的全部缓存数据均需要更新。

这就带来了不一致问题。

分析

这种不一致问题的根源是直接映射的不灵活性造成的,相当于任何一个用户一旦得到Hash值就会与某一特定服务器永久绑定。缺点就是服务端的变化会带来大面积的重新映射,而新的映射被迫要重新加载缓存。考虑到如果用户A的服务器映射失效,那么只需要把用户A重新关联到其他服务器上,这样就避免了很多不必要的影响,也既维护了一致性。

解决方案

使用 Consistent Hashing,理论分析参考 原始论文。基本原理就是从直接的映射变为间接映射,每次算出Hash值后再多一个寻找临近服务器的过程。

图源自网络,侵删

可以使用一些高效的数据结构维护服务器列表,Key是Hash值,Value是服务器地址。要求具备有序性,可以用TreeMap或者SkipList,二者的效率都很高。考虑到哈希的场景通常用于负载均衡领域,所以我们还要照顾平衡性,也就是尽可能均匀。在Hash算法的选取上可以使用MD5,想象所有的映射在一个环上,用户结点对应的服务器就是在环上与自己距离最近的一个,如果当前服务器下线,那么重新寻找与自己距离最近的就可以了。

在环上有一种可能,如果所有的服务器分布比较集中,也容易造成用户结点映射不均衡。这时我们引入虚拟结点的概念,每一个真实结点对应N个虚拟结点,把每个虚拟结点都映射到环上,这样可以很大程度避免不平衡的问题。而用户只要找到虚拟结点就能映射到真实结点了。

参考

在计算MD5的时候,用到了一些掩码技巧,请参考这里。关于Java自身的MD5算法的解读,请参考这里

完整实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.*;
import java.util.concurrent.ConcurrentSkipListMap;

public class ConsistentHash<T> {

//跳表对于查找和遍历均有不错的执行效率
private volatile SortedMap<Long, T> virtualNodes = new ConcurrentSkipListMap();
private MessageDigest md5 = null;
private final int VIRTUAL_NODES;

//初始服务器
private static String[] servers = {
"192.168.0.1:8080",
"192.168.0.2:8080",
"192.168.0.3:8080",
"192.168.0.4:8080"
};

public ConsistentHash(int numberOfVirtualNodes){
this.VIRTUAL_NODES = numberOfVirtualNodes;
for (String x : servers){
addServer((T) x);
}
}

public void addServer(T serverAddr){
System.out.println("True Node: " + serverAddr.toString() + " will be put in the circle");
for (int i = 0; i < VIRTUAL_NODES; i++) {
String nodeAddr = serverAddr.toString() + "#Virtual:" + i;
long hash = getConsistentHash(nodeAddr);
System.out.println("Virtual Node: " + nodeAddr + " has been added with hash code: " + hash);
virtualNodes.put(hash, serverAddr);
}
}

public void deleteServer(T deleteAddr){
System.out.println("True Node: " + deleteAddr.toString() + " will be removed from the circle");
for (int i = 0; i < VIRTUAL_NODES; i++) {
String nodeAddr = deleteAddr.toString() + "#Virtual:" + i;
long hash = getConsistentHash(nodeAddr);
System.out.println("Virtual Node: " + nodeAddr + " has been removed by hash code: " + hash);
virtualNodes.remove(hash);
}
}

private long getConsistentHash(String addr){
if(md5 == null) {
try {
md5 = MessageDigest.getInstance("MD5");
}catch (NoSuchAlgorithmException e){
throw new IllegalStateException(e);
}
}

//重置
md5.reset();
//添加要计算的摘要信息
md5.update(addr.getBytes());
//计算摘要
byte[] bKey = md5.digest();

/* 得到的byte分为4段,分段取出,然后组合在一起
* byte是有符号8位,long比byte长,这时候如果想实现无符号扩展,要用掩码来限制
* 0xFF默认int类型,最低8位是1111 1111,前面24位均为0
*/
long result = ((long) (bKey[3] & 0xFF) << 24) | ((long) (bKey[2] & 0xFF) << 16 |
((long) (bKey[1] & 0xFF) << 8) | (long) (bKey[0] & 0xFF));

return result & 0xffffffffL;
}

public T getServer(Object userNode){
if (!virtualNodes.isEmpty()) {
long hash = getConsistentHash((String) userNode);
SortedMap<Long, T> tailMap = virtualNodes.tailMap(hash);
hash = tailMap.isEmpty() ? virtualNodes.firstKey() : tailMap.firstKey();

return virtualNodes.get(hash);
}
return null;
}

public static void main(String[] args){
//test case here
ConsistentHash<String> consistentHash = new ConsistentHash<>(3);

System.out.println(consistentHash.getServer("192.168.155.155:8080"));
System.out.println(consistentHash.getServer("192.168.2.155:8080"));
System.out.println(consistentHash.getServer("192.168.155.134:8080"));

consistentHash.deleteServer("192.168.0.1:8080");

System.out.println(consistentHash.getServer("192.168.155.155:8080"));
System.out.println(consistentHash.getServer("192.168.2.155:8080"));
System.out.println(consistentHash.getServer("192.168.155.134:8080"));
}
}

运行结果

注意到,在删除服务结点192.168.0.1:8080后,原本指向192.168.0.4:8080的用户结点不受影响。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
True Node: 192.168.0.1:8080 will be put in the circle
Virtual Node: 192.168.0.1:8080#Virtual:0 has been added with hash code: 3991322147
Virtual Node: 192.168.0.1:8080#Virtual:1 has been added with hash code: 2054636666
Virtual Node: 192.168.0.1:8080#Virtual:2 has been added with hash code: 1058881351
True Node: 192.168.0.2:8080 will be put in the circle
Virtual Node: 192.168.0.2:8080#Virtual:0 has been added with hash code: 349956815
Virtual Node: 192.168.0.2:8080#Virtual:1 has been added with hash code: 2893347203
Virtual Node: 192.168.0.2:8080#Virtual:2 has been added with hash code: 967431380
True Node: 192.168.0.3:8080 will be put in the circle
Virtual Node: 192.168.0.3:8080#Virtual:0 has been added with hash code: 713816566
Virtual Node: 192.168.0.3:8080#Virtual:1 has been added with hash code: 3143344329
Virtual Node: 192.168.0.3:8080#Virtual:2 has been added with hash code: 2073965171
True Node: 192.168.0.4:8080 will be put in the circle
Virtual Node: 192.168.0.4:8080#Virtual:0 has been added with hash code: 999813558
Virtual Node: 192.168.0.4:8080#Virtual:1 has been added with hash code: 1089387445
Virtual Node: 192.168.0.4:8080#Virtual:2 has been added with hash code: 3541119725
192.168.0.1:8080
192.168.0.1:8080
192.168.0.4:8080
True Node: 192.168.0.1:8080 will be removed from the circle
Virtual Node: 192.168.0.1:8080#Virtual:0 has been removed by hash code: 3991322147
Virtual Node: 192.168.0.1:8080#Virtual:1 has been removed by hash code: 2054636666
Virtual Node: 192.168.0.1:8080#Virtual:2 has been removed by hash code: 1058881351
192.168.0.3:8080
192.168.0.3:8080
192.168.0.4:8080

Process finished with exit code 0