Today, I didn't understand the part about how the put function retrieves the index from the hash table in the Java source code. After some in-depth investigation, I finally understood the reason behind it. Just recording it down.
Today, I didn't understand the part about how the put function retrieves the index from the hash table in the Java source code. After some in-depth investigation, I finally understood the reason behind it. Just recording it down.
First, it is important to understand that the length of the hash table in HashMap is 2^n, so we have the following formula:
(n - 1) & hash = hash % n
where n is the length of the hash table and is a power of 2 (note the order). The hash%n is the common method we use to map the hash value to the corresponding table position, which is done by taking the modulus of the hash with the table length, resulting in a number that is definitely smaller than the table length. Then, we simply place the value in the corresponding position. The key point is that the expression (n - 1) & hash is a bitwise AND operation between the length of the table minus 1 and the hash. First, let’s explain the bitwise AND operation:
The AND operation is a binary operation:
Operation rules: 0&0=0;0&1=0;1&0=0;1&1=1; meaning that only when both are 1 can the result be 1, otherwise it is 0.
For example:
01011 (decimal 11)
00111 (decimal 7)
-—–
00011 (decimal 3)
OK, while researching, I also found information related to bitwise operations:
Symbol | Description | Operation Rules |
---|---|---|
& | AND | Both bits must be 1 for the result to be 1 |
| | OR | Both bits must be 0 for the result to be 0 |
^ | XOR | Same bits yield 0, different bits yield 1 |
~ | NOT | 0 becomes 1, 1 becomes 0 |
<< | Left Shift | Each bit shifts left by a number of places, high bits are discarded, low bits are filled with 0 |
>> | Right Shift | Each bit shifts right by a number of places; for unsigned numbers, high bits are filled with 0, for signed numbers it varies |
Left shifting in binary is equivalent to multiplying by 2, while right shifting is equivalent to dividing by 2.
For example, in decimal: shifting 10 left becomes 100, which is multiplying by ten. Shifting right becomes 1, which is equivalent to dividing by ten; very interesting. Amazing.
Now, back to the topic of modulus, how can a proper modulus operation turn into an AND operation?
The right shift operation mentioned above, for binary, is equivalent to dividing by 2, but! Here’s the key point. The number that is shifted out is the remainder of the division by 2! Isn’t that amazing? Let’s look at an example:
101001>>1 = 10100 remainder 1 corresponds to 41%2 remainder 1, continuing up
101001>>2 = 1010 remainder 01 corresponds to 41%4 remainder 1, continuing
101001>>3 = 101 remainder 001 corresponds to 41%8 remainder 1, go on
101001>>4 = 10 remainder 1001 corresponds to 41%16 remainder 9
So for hash%x, when x is 2^n, it equals the low n bits of hash in binary.
How do we get the low n bits? We find that the number 2^n always has n+1 bits with 1 followed by all 0s, for example:
10 is 2, 100 is 4, 1000 is 8, 10000 is 16,
Then the corresponding 2^n-1 is a number with n+1 bits of 0 and the rest 1s, for example:
01 is 1, 011 is 3, 0111 is 7, 01111 is 15. When this number performs a bitwise AND operation with hash, it perfectly retains the low n bits of hash, that’s how amazing it is.
Thus, hash%n=hash&(n-1) when n is a power of 2. So what are the benefits of doing this? It greatly improves efficiency; bitwise operations (&) are much more efficient than modulus operations (%), mainly because bitwise operations directly manipulate memory data without needing to convert to decimal, hence the processing speed is very fast. This is a frequently executed part for retrieving indices from the hash table in HashMap. This greatly saves overhead.
Quoting a saying from a big shot, it really describes it thoroughly:
If the length of the array is a power of 2, which is an even number, then length-1 is an odd number. The last bit of an odd number in binary is 1, so regardless of how much hash(key) is, the last bit of hash(key)&(length-1) in binary could be 0 or 1, depending on hash(key). That is, if hash(key) is odd, it will be mapped to an odd position in the array.
If the length is an odd number, then length-1 is an even number, and the last bit of an even number in binary is 0, so regardless of whether hash(key) is odd or even, that element will be mapped to an even position in the array, with no elements in odd positions!
Therefore, when the length of the array is a power of 2, the probability of hash collisions can be reduced by at least half, and all elements can also be evenly distributed in the array.
Note: The modulus operation: a % b is equivalent to a - (a / b) * b.
To clarify this part, I really searched for a lot of information. I have to say that Baidu is really terrible. The company also cannot access Google and StackOverflow, which affects the efficiency of information retrieval.
However, this article below was the trigger that enlightened me.
Bitwise AND Operation and Modulus