HashMap - Yige

HashMap#

1. Basic Concepts#

In JDK 8 and later versions, HashMap introduced a red-black tree structure, changing its underlying data structure from array + linked list to array + linked list and array + red-black tree.

Overall structure of HashMap:

Red-black tree structure of HashMap after JDK 1.8

Conversion to Red-Black Tree#

Reference: https://juejin.im/entry/5839ad0661ff4b007ec7cc7a
Several parameters:

// The threshold for treeifying a bucket; when the number of elements in the bucket exceeds this value, red-black tree nodes will replace linked list nodes.
static final int TREEIFY_THRESHOLD = 8;
// The threshold for un-treeifying a tree; when expanding, if the number of elements in the bucket is less than this value, the tree structure will be restored (split) into a linked list.
static final int UNTREEIFY_THRESHOLD = 6;
// The minimum capacity for treeification; when the capacity of the hash table exceeds this value, the buckets can be treeified; otherwise, if there are too many elements in the bucket, it will expand instead of treeifying.
// To avoid conflicts between expansion and treeification, this value cannot be less than 4 * TREEIFY_THRESHOLD.
static final int MIN_TREEIFY_CAPACITY = 64;

When adding elements, by default, if the number of linked list nodes in the bucket exceeds 7, the linked list will attempt to convert into a red-black tree:

// If the capacity of HashMap is less than 64, it will choose to expand instead of directly converting to a red-black tree.
if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) {
            resize();
}
// TREEIFY_THRESHOLD defaults to 8
if (binCount >= TREEIFY_THRESHOLD - 1) {
    treeifyBin(tab, hash);
}

When the resize() method is called to expand, if the current bucket structure is a red-black tree and the number of elements is less than the un-treeifying threshold UNTREEIFY_THRESHOLD (default is 6), it will call the split() method to shrink or directly restore (split) the tree structure into a linked list:

if (lc <= UNTREEIFY_THRESHOLD) {
    tab[index] = loHead.untreeify(map);
}

2. Understanding Default Parameters of HashMap#

Why is the default length of HashMap's Entry array 16? Why must the array length be a power of 2?#

First, let's look at how HashMap calculates the hashcode to get the storage position:

static int indexFor(int h, int length) {
    return h & (length-1);
}

With a length of 16 or other powers of 2, the value of length - 1 is all binary bits set to 1. In this case, the result of the index is equivalent to the last few bits of the hashcode value. As long as the input hashcode itself is evenly distributed, the result of the hash algorithm will also be uniform.
Thus: The default length of HashMap is 16 and the requirement for the array length to be a power of 2 is to reduce the probability of hash collisions.

Why is the load factor for HashMap expansion limited to 0.75?#

The expansion method in HashMap involves creating a new array that is double the length of the previous array and transferring all elements from the current Entry array to the new array. The length of the new array after expansion is double that of the previous one, making expansion a resource-intensive operation.
The trigger condition for expansion is: Critical value = default array length x load factor

If the load factor is 0.5 or even lower, the resulting temporary critical value will be significantly small, leading to wasted allocated memory and excess unused memory space, which does not satisfy the requirement for uniform distribution in the hash table.
If the load factor reaches 1, meaning the Entry array is full before expansion occurs, this will lead to a large number of hash collisions, resulting in excessively long linked lists and affecting the efficiency of get queries.
Therefore, a compromise value of 0.5 to 1, which is 0.75, is chosen to balance the above situations.

3. Basic Operations of HashMap#

get Operation#

/**
 * Calculate the hash value of the key using the hash(key) method.
 * Retrieve the node using the getNode() method.
 * If the node is null, return null; otherwise, return node.value.
 */
public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

/**
 * getNode can be divided into the following steps:
 * 1. If the hash table is empty or the bucket corresponding to the key is empty, return null.
 * 2. If not empty, if the first node in the bucket matches the specified hash and key, return this node directly.
 * 3. If the first node does not match, proceed with subsequent searches:
 *  3.1 If the current bucket uses a red-black tree, call the red-black tree's getTreeNode to retrieve the node value.
 *  3.2 If it is a linked list structure, traverse the linked list directly to search.
 * 4. Return the value if the node is found; otherwise, return null.
 */
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    // If the hash table and the bucket corresponding to the key are not empty
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        // Optimization point 1: Reduce unnecessary queries
        // If the first node in the bucket matches the specified hash and key, return this node directly.
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // If the first node does not match, proceed with subsequent searches.
        if ((e = first.next) != null) {
            // If the current bucket uses a red-black tree, call the red-black tree's getTreeNode to retrieve the node value.
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // If it is a linked list structure, traverse the linked list directly to search.
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

put Operation#

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

/**
 * The putVal method can be divided into the following steps:
 * 1. If the hash table is empty, call the resize method to create it.
 * 2. If the specified key does not conflict, i.e., there is no hash collision, insert the key-value pair directly.
 * 3. If there is a conflict, find the conflicting node.
 *  3.1 If the first node matches, record it; otherwise, continue searching.
 *  3.2 If the current structure is a red-black tree, call the putTreeVal method to insert.
 *  3.3 If it is a linked list structure, loop through the linked list to search. If not found, use tail insertion to insert the node; if found, record this node and exit the loop.
 * 4. If a conflicting node is found, check if onlyIfAbsent is true, which indicates whether to allow modification of the old value.
 * 5. Increment the modification count modCount and the size size by 1. If size > threshold, i.e., the critical value is reached, perform expansion.
 *
 * @param hash The hash value of the specified key.
 * @param key The specified key.
 * @param value The value to be inserted.
 * @param onlyIfAbsent Indicates whether to allow replacing the old value.
 * @param evict Indicates whether the HashMap is in creation mode.
 * @return If the specified key exists and the old value is replaced, return this value; otherwise, return null. Of course, the value may also be null.
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // If the hash table is empty, call the resize method to create it.
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // If the specified key does not conflict, i.e., there is no hash collision, insert the key-value pair directly.
    // (n - 1) & hash is equivalent to hash % n, but bitwise operations are faster.
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    // If there is a conflict, find the conflicting node.
    else {
        Node<K,V> e; K k;
        // If the first node matches, record it; otherwise, continue searching.
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // If the current structure is a red-black tree, call the putTreeVal method to insert.
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            // If it is a linked list structure, loop through the linked list to search. If not found, use tail insertion to insert the node.
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // Check if the length of the linked list has reached the critical value; if so, convert it to a red-black tree structure.
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // If found, record this node and exit the loop.
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        // If the value is not null, it indicates that a conflicting node was found.
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            // Check if onlyIfAbsent is true, which indicates whether to allow modification of the old value.
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    // Check if size has reached the critical value and needs to be expanded.
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

Expansion Operation#

/**
 * Initialize or expand the table; during expansion, the capacity is always doubled. After binary calculation, the original nodes will either remain in their original positions or be allocated to "original position + old capacity".
 * Summary of the resize steps:
 * 1. Calculate the capacity and critical value after expansion.
 * 2. Update the critical value of the HashMap.
 * 3. Create a new array based on the calculated expansion capacity, then point the HashMap's table reference to the new array.
 * 4. Copy the elements from the old array to the table.
 */
final Node<K,V>[] resize() {
    // Create a new oldTab array to save the array table before expansion.
    Node<K,V>[] oldTab = table;
    // Get the length of the old array.
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    // Old array's critical value.
    int oldThr = threshold;
    int newCap, newThr = 0;
    // If the old array capacity is greater than 0.
    if (oldCap > 0) {
        // If the old array capacity exceeds the maximum value, set its critical value to the maximum integer value.
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // If the old array capacity is greater than the default capacity (16) and the new array capacity after doubling is less than the maximum value, double the old array's critical value.
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    // If old capacity <= 0, and critical value > 0.
    else if (oldThr > 0) // initial capacity was placed in threshold
        // Set the new array capacity to the old array's critical value.
        newCap = oldThr;
    // If old capacity <= 0, and critical value <= 0.
    else {
        // Set the new capacity to the default value (16).
        newCap = DEFAULT_INITIAL_CAPACITY;
        // Set the new critical value to default capacity * default load factor (16 * 0.75) = 12.
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // In the above condition checks, only when oldThr > 0, newThr == 0.
    if (newThr == 0) {
        // Perform legal checks and set the new critical value.
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    // Perform the operation of copying the old array contents to the new array.
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        // Traverse each node of the old hash table to copy to the new array.
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            // Use e to record the value of the old node.
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                // If the structure of the old bucket's storage conflict hash value is a red-black tree.
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                // If the structure of the old bucket's storage conflict hash value is a linked list structure.
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    // In Java 7, the insertion into the array was done one by one inside the while loop, which could cause circular issues in multi-threaded situations.
                    // In Java 8, the assignment to the array is done only after the entire while loop of the linked list is finished, so there will be no circular issues in multi-threaded situations.

                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    // Nodes remain in their original positions.
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // Nodes are allocated to "original position + old capacity".
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

4. What Conditions Must a Class Meet if Its Key Value in HashMap is an Object?#

Generally, wrapper classes like String and Integer are used as keys because:

These wrapper types are final, which ensures immutability, thus guaranteeing the immutability and accuracy of the hash value.
They override the equals() and hashcode() methods internally.

Therefore, if the key in HashMap is an object of a class, it is necessary to override the hashcode and equals() methods.

5. How Does HashMap Resolve Hash Collisions?#

Methods to Resolve Hash Collisions#

Analysis of Common Methods to Resolve Hash Collisions

Open Addressing Method#

Starting from the unit where the collision occurred, find an empty unit in the hash table in a certain order. Then store the colliding element in that unit. The required table length must be greater than or equal to the number of elements to be stored.
This includes the following methods:

Linear Probing
Starting from the unit where the collision occurred, check whether the next unit is empty sequentially. When reaching the last unit, check from the beginning of the table sequentially until encountering an empty unit or probing all units.
Quadratic Probing
When a collision occurs, add 1², 2², etc. to the unit where the collision occurred d[i], i.e., d[i] + 1², d[i] + 2², d[i] + 3²... until an empty unit is found.
Double Hashing Probing Method

Chaining Method (Separate Chaining)#

The idea of the chaining method is to form a singly linked list of elements with the same hash value and store the head pointer of the linked list in the i-th unit of the hash table. Searching, inserting, and deleting mainly occur within the synonym linked list.

Rehashing Method#

This involves constructing multiple different hash functions simultaneously:
Hi = RHi(key) i= 1,2,3 ... k;
When H1 = RH1(key) causes a collision, H2 = RH2(key) is used for calculation until no more collisions occur. This method does not easily cause clustering but increases computation time.

6. What Are the Differences Between HashMap in JDK 1.7 and JDK 1.8?#

JDK 1.8 introduced red-black trees, reducing the time complexity of traversal queries from O(n) with linked lists to O(logn).
When inserting data, JDK 1.7 used head insertion, which could lead to reverse order and circular linked list deadlock issues, while JDK 1.8 used tail insertion to avoid this problem. For details, see: Reasons for Circular Linked Lists in JDK 1.7 HashMap and Solutions in JDK 1.8.
The calculation method for data storage positions after expansion is different:
1. JDK 1.7 used hashCode() + 4 bitwise operations + 5 XOR operations (9 perturbations).
static final int hash(int h) {
h ^= k.hashCode();
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
```
  ```
```
2. JDK 1.8 simplified the perturbation function, performing only 2 perturbations = 1 bitwise operation + 1 XOR operation.
Java static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); // a. When key = null, the hash value = 0, so the key of HashMap can be null. // Note: Compared to HashTable, HashTable directly uses hashCode() on the key, and if the key is null, it throws an exception, so the key of HashTable cannot be null. // b. When key ≠ null, the hashCode() of the key is first calculated (denoted as h), and then the hash code is perturbed: the hash code is XORed with the binary representation of the hash code shifted right by 16 bits. }

7. Why Doesn't HashMap Directly Use the Hash Value Processed by hashCode() as the Index of the Table?#

HashMap does not directly use hashCode() but chooses to use a perturbation function to ensure that the hash code (hash value) generated based on the key is more evenly distributed and random, avoiding hash value collisions.

8. Differences Between HashMap, LinkedHashMap, and TreeMap#

When traversing HashMap, its output is unordered.
LinkedHashMap inherits from HashMap and is efficient while also adding an internal linked list to maintain the order of elements.
TreeMap implements the SortedMap interface, allowing it to sort elements.
LinkedHashMap sorts based on the order of element entry into the collection or the order of access, while TreeMap sorts based on the inherent order of elements (determined by Comparator or Comparable).

9. HashMap Data Loss Issues#

Reference: HashMap Source Code and Data Loss Issues in Multithreaded Situations

10. Reference Links#

Java Source Code Analysis: Major Updates on HashMap 1.8