Sunday, August 29, 2021

HashSet in Java With Examples

HashSet in Java is the implementation of the Set interface and it is an unordered collection meaning insertion order is not maintained in a HashSet.

Following are the important points about the HashSet in Java.
  1. HashSet is part of Java Collections framework. HashSet class extends AbstractSet and implements Set, Cloneable and Serializable interfaces.
  2. HashSet is an unordered collection. A hash is calculated using the element that is added to the HashSet and that decides where the element will be stored.
  3. HashSet only stores unique elements, meaning duplicates are nor allowed.
  4. HashSet permits one null element to be added.
  5. HashSet implementation is not synchronized hence not thread safe. If HashSet is to be used in a multi-threaded environment where it is accessed and modified concurrently then it must be synchronized externally. That can be done by wrapping the set with in Collections.synchronizedSet method.
  6. The iterators returned by HashSet's iterator method are fail-fast: if the set is modified at any time after the iterator is created, in any way except through the iterator's own remove method, the Iterator throws a ConcurrentModificationException.

Java HashSet constructors

HashSet class in Java provides the following constructors.
  • HashSet()- Constructs a new, empty set; the backing HashMap instance has default initial capacity (16) and load factor (0.75).
  • HashSet(int initialCapacity)- Constructs a new, empty set; the backing HashMap instance has the specified initial capacity and default load factor (0.75).
  • HashSet(int initialCapacity, float loadFactor)- Constructs a new, empty set; the backing HashMap instance has the specified initial capacity and the specified load factor.
  • HashSet(Collection<? extends E> c)- Constructs a new set containing the elements in the specified collection.

Java HashSet example

Let’s see an example creating a HashSet and adding elements to it to clear some of the points mentioned above.

import java.util.HashSet;
import java.util.Set;

public class HashSetDemo {
  public static void main(String[] args) {
    // creating a HashSet
    Set<String> citySet = new HashSet<String>();
    // Adding elements
    citySet.add("London");
    citySet.add(null);
    citySet.add("Tokyo");
    citySet.add("New Delhi");
    citySet.add("Beijing");
    citySet.add("Nairobi");
    citySet.add("New Delhi");
    citySet.add(null);
    // iterating HashSet
    for(String city : citySet){
      System.out.println("City- " + city);        
    }
  }
}

Output

City- null
City- Beijing
City- New Delhi
City- Nairobi
City- Tokyo
City- London

In the code this line Set<String> citySet = new HashSet<String>(); creates a HashSet of the default capacity.

Since all the collections are generic so you can specify the type of elements the HashSet is going to store at the creation time itself. HashSet created here is supposed to store Strings only.

From the output you can also verify the following-

  • Insertion order is not maintained in the HashSet.
  • Duplicates are also not allowed, New Delhi though inserted twice is added only once.
  • Null element is added but only once.

What is Initial capacity and Load factor

While mentioning the constructors of the HashSet you would have seen two words used Initial capacity and load factor. These two parameters affect the performance of the HashSet so let’s try to have an understanding of these two parameters.

HashSet internally uses HashMap to store its elements and HashMap has an array where the elements are ultimately stored. Each array index is referred as a bucket in the HashTable.

The capacity is the number of buckets in the hash table, and the initial capacity is the capacity at the time the hash table is created. If you don't specify the initial capacity then the array has a default capacity of 16. If you specify the initial capacity then the array will be created with that capacity

Initial capacity may not be enough once more and more elements are added to the Set and there may be a need to increase the capacity. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets. Default load factor is 0.75.

Java HashSet methods

  • add(E e)- Adds the specified element to this set if it is not already present.
  • clear()- Removes all of the elements from this set.
  • contains(Object o)- Returns true if this set contains the specified element.
  • isEmpty()-Returns true if this set contains no elements.
  • iterator()- Returns an iterator over the elements in this set.
  • remove(Object o)- Removes the specified element from this set if it is present.
  • size()- Returns the number of elements in this set (its cardinality).
  • spliterator()- Creates a late-binding and fail-fast Spliterator over the elements in this set.

Java HashSet is not synchronized

HashSet is not thread safe. If you are using HashSet in multithreaded environment and at least one of the threads modifies the set, it must be synchronized externally. You can use Collections.synchronizedSet() method to synchronize a Set. This method returns a synchronized (thread-safe) set backed by the specified set.

Set s = Collections.synchronizedSet(new HashSet());

Java HashSet iterator

The iterators returned by iterator method of the HashSet class are fail-fast, if the set is modified at any time after the iterator is created, in any way except through the iterator's own remove method, the Iterator throws a ConcurrentModificationException.

Example code

In the Java example program while iterating a HashSet using iterator, Set’s remove method is used to remove an element from the set so the program throws ConcurrentModificationException.

public class HashSetDemo {
  public static void main(String[] args) {
    // creating a HashSet
    Set<String> citySet = new HashSet<String>();
    // Adding elements
    citySet.add("London");        
    citySet.add("Tokyo");
    citySet.add("New Delhi");
    citySet.add("Beijing");
    citySet.add("Nairobi");
        
    Iterator<String> itr = citySet.iterator();
    while(itr.hasNext()){     
      String city = itr.next();
      if(city.equals("Beijing")){
        // Using Set's remove() method
        citySet.remove(city);
      }
      System.out.println("city- " + city);        
    }
  }
}

Output

city- Beijing
Exception in thread "main" java.util.ConcurrentModificationException
 at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429)
 at java.util.HashMap$KeyIterator.next(HashMap.java:1453)
 at org.netjs.prog.HashSetDemo.main(HashSetDemo.java:21)

Example using iterator’s remove method

If we use the same example and use iterator’s remove() method to remove an element while iterating ConcurrentModificationException won’t be thrown.

public class HashSetDemo {

  public static void main(String[] args) {
    // creating a HashSet
    Set<String> citySet = new HashSet<String>();
    // Adding elements
    citySet.add("London");        
    citySet.add("Tokyo");
    citySet.add("New Delhi");
    citySet.add("Beijing");
    citySet.add("Nairobi");
        
    Iterator<String> itr = citySet.iterator();
    while(itr.hasNext()){     
      String city = itr.next();
      if(city.equals("Beijing")){
        // Using Set's remove() method
        itr.remove();
      }
      System.out.println("city- " + city);        
    }
    System.out.println("--HashSet after element removal--");
    for(String city : citySet){
      System.out.println("city- " + city);
    }
  }
}

Output

city- Beijing
city- New Delhi
city- Nairobi
city- Tokyo
city- London
--HashSet after element removal--
city- New Delhi
city- Nairobi
city- Tokyo
city- London

Performance of HashSet

HashSet class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Since the elements are stored (or to check if exists using contains method) by calculating a hash. That hash value decides in which bucket value will be stored. Since there may be more than one value in a bucket then it's checked sequentially to get the corrct value. That may tempt you to increase the initial capacity so that there are more buckets.

But having a lot of buckets will increase the iteration time as iterating over a set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the number of buckets in the HashTable. Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

Reference: https://docs.oracle.com/javase/10/docs/api/java/util/HashSet.html

That's all for this topic HashSet in Java With Examples. If you have any doubt or any suggestions to make please drop a comment. Thanks!


Related Topics

  1. LinkedHashSet in Java With Examples
  2. TreeSet in Java With Examples
  3. CopyOnWriteArraySet in Java With Examples
  4. How to Sort HashSet in Java
  5. Java Collections Interview Questions And Answers

You may also like-

  1. Difference Between Comparable and Comparator in Java
  2. How LinkedList Class Works Internally in Java
  3. How HashMap Works Internally in Java
  4. How to Create Immutable Class in Java
  5. instanceof Operator in Java With Examples
  6. Lambda Expressions in Java 8
  7. Serialization and Deserialization in Java
  8. How to Inject Null And Empty String Values in Spring

No comments:

Post a Comment