One of the most important and most used collections in Java is java.util.HashMap. This is first post in series of posts describing details of implementation of java.util.HashMap.
java.util.HashMap is hash table based implementation of the java.util.Map interface. There are some important features of this implementation:
- it permits null values to be keys
- no guarantee about internal order of elements. Especiallty, internal order of elements can change while collection grows in size
- O(1) complexity (constant time, independent of collection size) of get/put methods
Two methods should be carefully implemented if element intended to be used as key in java.util.HashMap. These are methods:
- hashCode (here hash function over element is defined, it maps element into integer numbers)
- equals (here equality of elements is defined)
There is a contract between hashCode and equals. If two elements are equal, then hasCode should be the same. But inverse formulation is following: if two elements have the same hashCode, such elements shouldn’t be necessary equal.
Internally HashMap is array of certain capacity. Each non-null element of this array is start element of linked list where data elements itself are placed. When element is put into HashMap array index is defined based of element hash code. Then, if there is already linked list node in this position, we simply iterate through this linked list trying to find equal element (equality is checked by calling equals method). If equal element is found, new element is put instead. If equal element is not found new element is put at the end of the linked list. Each list contains elements with similar hash codes (similar – because the hash codes are mixed and division remainder is taken).
Hence, importance of good hash function is clear. If hash function if bad (e.g. it distributes hash codes not uniformly) then hash map performance will be worse. The worst case: hash function always returns the same number despite object state. In such a case hash map performance will be the same as linked list, i.e. O(n) (proportional to collection size).
There are two parameters that affect HashMap performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data
structures are rebuilt) so that the hash table has approximately twice the number of buckets. Default initial capacity is 16, and default load factor is 0.75 (it offers a good tradeoff between time and space costs).
If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. So, when creating HashMap for storage of a known quantity of elements, it is a good practice to define the initial size of HashMap as 1.1*N/0.75, where N is elements number (here default load factor value is used).
java.util.HashMap implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map.
The iterators returned by all of this class’s “collection view methods” are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator’s own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.