Friday, July 2, 2021

How HashSet Works Internally in Java

In this post we'll see how HashSet internally works in Java, which is also a favourite Java Collections interview question but before going into internal implementation of HashSet in Java it is important to know two points about HashSet.

  1. HashSet in Java only stores unique values i.e. no duplicates are allowed.
  2. HashSet works on the concept of hashing just like HashMap in Java but its working differs from the HashMap in the following way-
    • In HashMap a (Key, Value) pair is added and the hash function is calculated using key.
    • Where as in the HashSet hash function is calculated using the value itself. Note that in HashSet we have add(E e) method which takes just the element to be added as parameter.
    Also you may have guessed by now, since hash function is calculated using value that is why only unique values are stored in the HashSet. If you try to store the same element again, the calculated hash function would be same, thus the element will be overwritten.

HashSet internally uses HashMap

Now coming back to internal implementation of HashSet in Java the most important point is HashSet class implementation internally uses HashMap to store it's elements.

Within the HashSet there are many constructors one without any parameter and several more with initial capacity or load factor but each one of these constructor creates a HashMap. Since HashSet internally uses HashMap so knowing how HashMap works internally in Java will help you to understand how HashSet works internally in Java.

HashSet Constructor snippets

In the HashSet class in Java you can see that constructors of the class do create a HashMap.

/**
* Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
* default initial capacity (16) and load factor (0.75).
*/
public HashSet() {
  map = new HashMap<>();
}
public HashSet(int initialCapacity, float loadFactor) {
  map = new HashMap<>(initialCapacity, loadFactor);
}

And the map, which is used for storing values, is defined as

private transient HashMap<E,Object> map;

In the constructor, if you have noticed, there are parameters named initial capacity and load factor. For HashSet, default initial capacity is 16, that is an array (or bucket) of length 16 would be created and default load factor is 0.75. Where load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.

How elements are added - HashSet internal implementation

I stated in the point 2 above that HashSet calculates the hash function using value itself and there is no (Key, Value) pair in HashSet and then came the statement that HashSet internally uses HashMap to store objects. These two statements may sound contradictory as HashMap stores (key, value) pair so let's see how these these two contradictory statements hold true.

Actually from add method of HashSet class put() method of HashMap is called where the value, which has to be added in the Set, becomes Key and a constant object "PRESENT" is used as value.

That's how PRESENT is defined in HashSet implementation-

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();

And that's how add method is implemented in HashSet class -

public boolean add(E e) {
  return map.put(e, PRESENT)==null;
} 

So you can see with in the internal implementation of the HashSet it's a (key, value) pair which is actually getting added. It's just that the actual value (which is added to the HashSet) becomes the key and a dummy value "PRESENT" is added as value when storing it in the backing HashMap.

For example a statement for adding an element to HashSet- set.add("Mumbai"); internally translates into map.put("Mumbai", PRESENT); and then added to the backing HashMap instance.

One thing to note here is, in HashMap value may be duplicate but Key should be unique. That's how HashSet makes sure that only unique values are stored in it, since the value which is to be stored in the HashSet becomes the key while storing it in HashMap.

How element is removed - HashSet internal implementation

When we need to remove an element from the HashSet, internally again remove method of HashSet calls remove(Object key) method of the HashMap.

That is how it is implemented in HashSet class.

public boolean remove(Object o) {
  return map.remove(o)==PRESENT;
}

Here note that remove(Object key) method of the HashMap returns the Value associated with the key. Whereas the remove(Object o) method of the HashSet returns Boolean value. Also we know that for every value added in HashSet, internally when it is added to the associated HashMap, value becomes Key and the value is always an object called PRESENT. Therefore the value that is returned from the remove(Object key) method of the HashMap is always PRESENT thus the condition map.remove(o)==PRESENT.

How elements are retrieved from HashSet in Java

In HashSet there is no get method as provided in Map or List. In HashSet iterator is there which will iterate through the values of the Set. Internally it will call the keyset of the HashMap, as values are stored as keys in the HashMap so what we'll get is the values stored in the HashSet.

That's how iterator is internally implemented in the HashSet in Java.

/**
* Returns an iterator over the elements in this set. The elements
* are returned in no particular order.
*
* @return an Iterator over the elements in this set
* @see ConcurrentModificationException
*/
public Iterator<E> iterator() {
  return map.keySet().iterator();
}

Points to note

  1. Unlike HashMap where hash function is calculated using key HashSet uses the value itself to calculate the hash function.
  2. Since hash function is calculated using value that is why only unique values are stored in the HashSet.
  3. HashSet internally uses HashMap to store its elements.
  4. When element is added to HashSet using add(E e) method internally HashSet calls put() method of the HashMap where the value passed in the add method becomes key in the put() method. A dummy value “PRESENT” is passed as value in the put() method.

Recommendations for learning (Udemy Courses)

  1. Java Programming Masterclass Course
  2. Java In-Depth: Become a Complete Java Engineer!
  3. Spring Framework Master Class Course
  4. Complete Python Bootcamp Course
  5. Python for Data Science and Machine Learning

That's all for this topic How HashSet Works Internally in Java. If you have any doubt or any suggestions to make please drop a comment. Thanks!


Related Topics

  1. How ArrayList Works Internally in Java
  2. How HashMap Works Internally in Java
  3. How LinkedList Class Works Internally in Java
  4. LinkedHashSet in Java With Examples
  5. TreeSet in Java With Examples

You may also like-

  1. EnumSet in Java With Examples
  2. Java Collections Interview Questions And Answers
  3. Varargs (Variable-length Arguments) in Java
  4. finalize method in Java
  5. Interface Static Methods in Java
  6. Method Reference in Java
  7. String Vs StringBuffer Vs StringBuilder in Java
  8. Why wait(), notify() And notifyAll() Must be Called Inside a Synchronized Method or Block

9 comments:

  1. very nicely explained..kudos !!

    ReplyDelete
  2. Here is the doubt, you said "HashSet, from add method, put method of HashMap is called where the value which has to be added in the Set becomes Key and a constant object PRESENT is used as value", but in Hashmap it works on hashing concept change it key into hashcode and save it in a bucket, we can also have same hashcode for two different keys in hashmap but in hashmap if 2 different key has same hashmap, then it use equals() method to find exact value, what in case of HashSet if we have same hashcode for 2 different values even if it try to use equals() method, it save PRESENT as value.
    what will happen in this case. Could you please clarify.

    ReplyDelete
    Replies
    1. Well there is no get() method in HashSet so your scenario will never occur, please read "How Object is retrieved from HashSet" section of the post. All you can get is a iterator which will call keyset of the HashMap so you will get set of keys in the HashMap which incidentally means elements of the HashSet.

      Delete
  3. Your explanation directly hits the brain...Thanq :)

    ReplyDelete
  4. Very good tutorial, for all implementations. Suggestions: include comparison of all collections, feature like when to use which collection, like arrays should be used for faster access and is slow during insertion etc.

    ReplyDelete
  5. All topics are very well explained.Should have search option in this blog.

    ReplyDelete
  6. I have a doubt. When we say hashset cannot have duplicates because hashcode is called on the value itself instead of key, then what about hash collision. What if 2 values have the same hashcode? Since it is a possible case in HashMap, where 2 different key’s can have same hashcode, can’t this happen for same 2 different values in case of HashSet?

    ReplyDelete
    Replies
    1. Yes that could be a possible scenario for a poorly implemented hashcode. If two different values have the same hashcode then latest value addition will overwrite the old one despite values being different.

      Delete