How Hashmap works Internally in Java
In this article, we are going to see how HashMap internally works in java. Also, we will have a look at what Java 8 made changes on the internal working of Hashmap to make it faster.
What is Hashmap and Hashing in java?
How Hashmap works Internally in Java is majorly dependent upon the Hashing Principle. So, Before going to learn how HashMap works internally in java, lets first understand what is HashMap and hashing.
HashMap :
A HashMap is a map used to store mappings of key-value pairs. Also, it works on the Principle of Hashing. To know more about the HashMap, visit this article: HashMap in JavaHashing Principle :
Simply, Hashing is a method used to produce an integer value from an object and this integer value known as the hash value. In HashMap, the key object is used for Hashing.Internal Struture of the HashMap in java
For internal working of HashMap, HashMap maintains an array of bucket, each bucket is a linked-list and linked list is a list of nodes wherein each node contains key-value pair.
Note : After Java 8, a bucket can be a linked-list or a binary tree depending upon the Threshold. We will discuss this later in the post.
Array of Bucket or Bucket Table :
The array of bucket is also known as the Bucket Table of the HashMap in Java. HashMap uses hashing method to calculate the index of array or which Bucket is going to be used. Internally Bucket array defined asInternal Representation of Bucket Array
// Each Node of the Array is a Bucket
Node<K,V>[] table;
Bucket :
A bucket is a linked list of nodes where each node is an instance of class Node<K,V>. The key of the node is used to generate the hash value and this hash value is used to find the bucket from Bucket Table.
Note: When two or more keys generate the same hash, then it called a Hash Collision. In that case, the new node adds to the linked list. We will discuss more this later in the post that how HashMap works internally in java to handle Hash Collision.
Node :
Each node of the Linked List is an instance of class Node<K,V> and actually store key-value pair that we are storing in the HashMap. This class is an static inner class of HashMap class and it implements the Map.Entry<K,V> interface. The representation of this Node as :Inner Static Node class of HashMap
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
V value;
Node<K,V> next;
}
- final int hash : hash value of the key
- final K key : Key of the node
- V value : Value of the node
- Node<K,V> next : Pointer to the next node present in the bucket or linked-list
How Hashmap Calculate the Index of Bucket in java?
To understand how HashMap works internally in Java, we must know about how the HashMap calculates the index of the bucket. Till now, we know the internal structure of HashMap, that HashMap maintains an array of the bucket. But when we store or retrieve any key-value pair, HashMap calculates the index of the bucket for each and every operation.
The Key object is used to calculate the index of the bucket. By using this key, hash value is calculated using hash(key) private method of HashMap.
The Key object is used to calculate the index of the bucket. By using this key, hash value is calculated using hash(key) private method of HashMap.
Note: hash(key) method is a private method of HashMap that returns the hash value of the key, also if the hash value is too large then converts it into a smaller hash value.
But what will happen, if hash value of Key Object returns the integer that is greater than the size of the array i.e., hash(key) > n, then ArrayOutOfBoundsException could be raised. To handle this situation, HashMap reduces the hash value between 0 and n-1 using an expression :
Index Calculating Expresssion
index = hash(key) & (n-1)
Now, this index value is generated is used by HashMap to find bucket location and can never generate any Exception as the index value always from 0 to n-1.
Role of hashCode() and equals() in HashMap
The hashcode() and equals() have a major role in how HashMap works internally in java because each and every operation provided by the HashMap uses these methods for producing results.
For any given key, the hash(key) private method of HashMap used to calculate the hash value. Internally, the hash(key) method uses hashCode() method of the key object to get the hash value.
hashCode() :
The hashCode() method puts a major impact on how HashMap works internally in java as it is used to find the bucket location from the Bucket Table.For any given key, the hash(key) private method of HashMap used to calculate the hash value. Internally, the hash(key) method uses hashCode() method of the key object to get the hash value.
Note :
a) If the key is null then the hash value returned by the hash(key) private method of HashMap will be 0.
b) If the key.hashCode() is too large then hash(key) private method of HashMap converts it into smaller number.
a) If the key is null then the hash value returned by the hash(key) private method of HashMap will be 0.
b) If the key.hashCode() is too large then hash(key) private method of HashMap converts it into smaller number.
equals() :
The equals() method also puts an impact on how HashMap works internally in java as it is used to check whether the key is present or not in the Bucket after the bucket is determined from the Bucket Table or Bucket Array.Or In other words, When the bucket location is calculated then each node of the bucket is traversed and compared with the key object to check whether the key is present or not in the Bucket.
So, we can say hashCode() is used to find which bucket and equals() is used for key uniqueness.
How put() method of Hashmap works internally in java?
The put() method is a method of HashMap used to store key-value pair.
Syntax of put() method
hashmap.put(key, value);
Working of put() method :
1. The key is used to calculate the hash value by calling private hash(key) method, internally hash(key) method call hashCode() method of key.
hash = hash(key);
Note :
HashMap can store one null value as the key. In that case, the hash value returned by the hash(key) method will be 0 and 0th bucket location will be used.
2. The above hash is reduced from 0 to n-1 to calculate the index of bucket (where n is the size of an array of the bucket).
index = hash & (n-1);
3. A new instance of Node<K,V> class is created.
Node<K,V> newNode = new Node(hash , key , value , null); // null for next pointer
4. The index of the bucket is used to fetch the bucket, then the new node is added to the fetched bucket.
How Hash Collision is resolved?
When the hashCode() method of two or more key generate the same value, then hash collision will occur. To resolve this issue, HashMap uses the concept of linked-list.
- When a new key and the key already in the HashMap generate the same hash value : then HashMap indexing the same bucket (bucket that already contains nodes in the form of linked-list). Then, the new node added at the last of the linked-list. The next pointer of Node class is used to create Linked-List.
- When a new value object is added with an existing key: logically, HashMap replaces the old value with new value related to the key. To do this, index of the bucket will be calculated (which will be the same for the key), then the indexed bucket or linked-list is traversed and key of the each node(k) is compared with the key by equals() method i.e., key.equals(k). When the equals() method returns true then the value of that node is replaced with the new value.
Time Complexity of put() method
HashMap store key-value pair in constant time which is O(1) as it indexing the bucket and add the node. But in worst case, it can be O(n) when all node returns same hashCode and added into the same bucket then traversal cost of n nodes will be O(n) but after the changes made by java 8 it can be maximum of O(log n).How get() method of HashMap works internally in java?
The get() method is a method of HashMap used to retrieve value by providing the key.
Syntax of get() method
value = hashmap.get(key);
Working of get() method :
1. The key is used to calculate the hash value by calling private hash(key) method, internally hash(key) method call hashCode() method of key.
hash = hash(key);
Note :
HashMap can store one null value as the key. In that case, the hash value returned by the hash(key) method will be 0 and 0th bucket location will be used.
2. The above hash is reduced from 0 to n-1 to calculate the index of bucket (where n is the size of array of bucket).
index = hash & (n-1);
3. If the bucket is null, then null will be returned.
Else the first node of the indexed bucket is fetched and key of that node(k) is compared with the key (provided in the get(key) method of HashMap), comparison is done by equals() method i.e., key.equals(k).
If the equals() method returned true then value of that node is returned otherwise next pointer of the node is used to fetched the next node in the list and apply the same procedure to that node. This process goes on till equals() returned true or next pointer is null. In case of next pointer is null then null is returned (which represents key is not present in the bucket or we can say not present in the HashMap).
Else the first node of the indexed bucket is fetched and key of that node(k) is compared with the key (provided in the get(key) method of HashMap), comparison is done by equals() method i.e., key.equals(k).
If the equals() method returned true then value of that node is returned otherwise next pointer of the node is used to fetched the next node in the list and apply the same procedure to that node. This process goes on till equals() returned true or next pointer is null. In case of next pointer is null then null is returned (which represents key is not present in the bucket or we can say not present in the HashMap).
Time Complexity of get() method
Similar to put() method, HashMap retrieves value by key in constant time which is O(1) as it indexing the bucket and add the node. But in worst case, it can be O(n) when all node returns same hashCode and added into the same bucket then traversal cost of n nodes will be O(n) but after the changes made by java 8 it can be maximum of O(log n).Load Factor and Initial Capacity of HashMap in java
Load Factor and Initial Capacity are two important factors that govern how HashMap works internally in java.
Initial Capacity is a measure used internally by HashMap that represents the number of bucket or size of the bucket array at the time of creation of HashMap. By default the initial capacity is 2^{4} = 16.
Load Factor is a factor that is used by HashMap internally to decide when the size of Bucket array needs to be increased. It is 0.75 by default i.e., when the number of nodes in the HashMap is more than 75% of Total Capacity then the HashMap grows its bucket array size.
Initial Capacity is a measure used internally by HashMap that represents the number of bucket or size of the bucket array at the time of creation of HashMap. By default the initial capacity is 2^{4} = 16.
Load Factor is a factor that is used by HashMap internally to decide when the size of Bucket array needs to be increased. It is 0.75 by default i.e., when the number of nodes in the HashMap is more than 75% of Total Capacity then the HashMap grows its bucket array size.
Note : The capacity of HashMap always doubled each time when HashMap needs to increased its bucket array size i.e., 2^{4} = 16 (initially), 2^{5} = 32, 2^{6} = 64 etc.
When to increase the size of bucket array in the HashMap?
As we know, both load factor and available capacity together is used by HashMap to decide when to increase the size of bucket array.
Load Factor : 0.75
Initial Capacity : 16 (Available Capacity initially)
Load Factor * Available Capacity = 0.75 * 16 = 12
So, at the time when 12+1 = 13^{th} key-value pair is added to the HashMap, then HashMap grow its bucket array size i.e, 16*2 = 32.
Now Available capacity : 32
----------------------------------
Next Time when
Load Factor * Available Capacity = 0.75 * 32 = 24 i.e., 25^{th} key-value pair is added to the HashMap, then HashMap grow its bucket array size i.e, 32*2 = 64 and so on.
Initial Capacity : 16 (Available Capacity initially)
Load Factor * Available Capacity = 0.75 * 16 = 12
So, at the time when 12+1 = 13^{th} key-value pair is added to the HashMap, then HashMap grow its bucket array size i.e, 16*2 = 32.
Now Available capacity : 32
----------------------------------
Next Time when
Load Factor * Available Capacity = 0.75 * 32 = 24 i.e., 25^{th} key-value pair is added to the HashMap, then HashMap grow its bucket array size i.e, 32*2 = 64 and so on.
Concept of Rehashing :
When HashMap grows its bucket array size, then Rehashing is done. Re-Hashing is a process where bucket index is calculated for each node again w.r.t. the updated size of the bucket array.
Changes made by Java 8 in internal working of HashMap
How HashMap works internally in java 8 is a little bit different from prior versions of java. HashMap in java 8, maintains a value called TREEIFY_THRESHOLD, it is an Integer Constant and currently the value of TREEIFY_THRESHOLD is 8. It is used as whenever in any bucket the number of nodes becomes more than this Threshold value then the data structure of that bucket is convert from linked-list to balanced tree. This change is done to make put-get operation faster as the linked-list provides O(n) time complexity while the time required to traverse the balance tree is O(log n). So that's how java 8 made HashMap much faster than its old version.
Related Article
- HashMap in Java
- HashSet in Java
- Lambda Expression in Java
- How to iterate HashMap in Java?
- How to sort HashMap by key and by value in Java?
So, this is all about how HashMap works internally in Java. In this post, we learn what is hashing, the internal structure of hashmap, how HashMap works internally in java to store and retrieve key-value pair and the changes made by java 8.
This is important to understand the internal working of HashMap as in most of the interview questions are related to HashMap, also HashMap is used in our day-to-day programming so with the better understanding we can use the HashMap efficiently.
This is important to understand the internal working of HashMap as in most of the interview questions are related to HashMap, also HashMap is used in our day-to-day programming so with the better understanding we can use the HashMap efficiently.
Nice blog on how hashmap works internally in java.Really a good source for beginers to start and explore this deep concept.
ReplyDeleteThe way you explained is tremendous. Great Article
ReplyDelete