In this post we'll see how HashSet internally works in Java, which is also a favourite
Java Collections interview question but before going into internal implementation of HashSet in Java it is important to know two points about HashSet.
- HashSet in Java only stores unique values i.e. no duplicates are allowed.
-
HashSet works on the concept of hashing just like HashMap in Java but its working differs from the HashMap in the following way-
- In HashMap a (Key, Value) pair is added and the hash function is calculated using key.
- Where as in the HashSet hash function is calculated using the value itself. Note that in HashSet we
have add(E e) method which takes just the element to be added as parameter.
Also you may have guessed by now, since hash function is calculated using value that is why only unique
values are stored in the HashSet. If you try to store the
same element again, the calculated hash function would be same, thus the element will be overwritten.
HashSet internally uses HashMap
Now coming back to internal implementation of HashSet in Java the most important point is HashSet class implementation internally uses HashMap to store it's elements.
Within the HashSet there are many constructors one without any parameter and several more with initial capacity or load factor but each one of these constructor creates a HashMap.
Since HashSet internally uses HashMap so knowing how HashMap works internally in Java will help you to understand how HashSet works internally in Java.
HashSet Constructor snippets
In the HashSet class in Java you can see that constructors of the class do create a HashMap.
/**
* Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
* default initial capacity (16) and load factor (0.75).
*/
public HashSet() {
map = new HashMap<>();
}
public HashSet(int initialCapacity, float loadFactor) {
map = new HashMap<>(initialCapacity, loadFactor);
}
And the map, which is used for storing values, is defined as
private transient HashMap<E,Object> map;
In the constructor, if you have noticed, there are parameters named initial capacity and load factor.
For HashSet, default initial capacity is 16, that is an array (or bucket) of length 16 would be created and
default load factor is 0.75. Where load factor is a measure of how full the hash table is allowed to get before
its capacity is automatically increased.
How elements are added - HashSet internal implementation
I stated in the point 2 above that HashSet calculates the hash function using value itself and there
is no (Key, Value) pair in HashSet and then came the statement that HashSet internally uses HashMap to store objects.
These two statements may sound contradictory as HashMap stores (key, value) pair so let's see how these these two
contradictory statements hold true.
Actually from add method of HashSet class put() method of HashMap is called where the value, which has to be
added in the Set, becomes Key and a constant object "PRESENT" is used as value.
That's how PRESENT is defined in HashSet implementation-
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
And that's how add method is implemented in HashSet class -
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
So you can see with in the internal implementation of the HashSet it's a (key, value) pair which is actually getting added.
It's just that the actual value (which is added to the HashSet) becomes the key and a dummy value "PRESENT" is added as value when
storing it in the backing HashMap.
For example a statement for adding an element to HashSet- set.add("Mumbai"); internally translates into map.put("Mumbai", PRESENT); and then added to the backing HashMap instance.
One thing to note here is, in HashMap value may be duplicate but Key should be unique. That's how HashSet
makes sure that only unique values are stored in it, since the value which is to be stored in the HashSet
becomes the key while storing it in HashMap.
How element is removed - HashSet internal implementation
When we need to remove an element from the HashSet, internally again remove method of HashSet
calls remove(Object key) method of the HashMap.
That is how it is implemented in HashSet class.
public boolean remove(Object o) {
return map.remove(o)==PRESENT;
}
Here note that remove(Object key) method of the HashMap returns the Value associated with the key.
Whereas the remove(Object o) method of the HashSet returns Boolean value. Also we know that for every value
added in HashSet, internally when it is added to the associated HashMap, value becomes Key and the value is always an
object called PRESENT. Therefore the value that is returned from the remove(Object key) method of the HashMap is always PRESENT thus the condition map.remove(o)==PRESENT.
How elements are retrieved from HashSet in Java
In HashSet there is no get method as provided in Map or List. In HashSet iterator is there which will iterate
through the values of the Set. Internally it will call the keyset of the HashMap, as values are stored as keys
in the HashMap so what we'll get is the values stored in the HashSet.
That's how iterator is internally implemented in the HashSet in Java.
/**
* Returns an iterator over the elements in this set. The elements
* are returned in no particular order.
*
* @return an Iterator over the elements in this set
* @see ConcurrentModificationException
*/
public Iterator<E> iterator() {
return map.keySet().iterator();
}
Points to note
- Unlike HashMap where hash function is calculated using key HashSet uses the value itself to calculate the hash function.
- Since hash function is calculated using value that is why only unique values are stored in the HashSet.
- HashSet internally uses HashMap to store its elements.
- When element is added to HashSet using add(E e) method internally HashSet calls put() method of the HashMap where
the value passed in the add method becomes key in the put() method. A dummy value “PRESENT” is passed as value in
the put() method.
Recommendations for learning (Udemy Courses)
- Java Programming Masterclass Course
- Java In-Depth: Become a Complete Java Engineer!
- Spring Framework Master Class Course
- Complete Python Bootcamp Course
- Python for Data Science and Machine Learning
That's all for this topic How HashSet Works Internally in Java. If you have any doubt or any suggestions to make please drop a comment. Thanks!
Related Topics
-
How ArrayList Works Internally in Java
-
How HashMap Works Internally in Java
-
How LinkedList Class Works Internally in Java
-
LinkedHashSet in Java With Examples
-
TreeSet in Java With Examples
You may also like-
-
EnumSet in Java With Examples
-
Java Collections Interview Questions And Answers
-
Varargs (Variable-length Arguments) in Java
-
finalize method in Java
-
Interface Static Methods in Java
-
Method Reference in Java
-
String Vs StringBuffer Vs StringBuilder in Java
-
Why wait(), notify() And notifyAll() Must be Called Inside a Synchronized Method or Block