Collections: HashSet & TreeSet
In Java, you may find yourself wanting to use a Set
rather than a List
. With a List
, you can specify the order of the elements within. To find an element at a unknown location, you would need to traverse the entire list. Suffice to say, this is not ideal. Instead, we have the option of foregoing the order of the elements in exchange for finding a matching element quicker. The ordering of the elements is whatever is best for that data structure internally. Enter theSet
.
A HashSet
and a TreeSet
are two examples of concrete implementations of a Set
. We’ll take a look at what they both are and when you may want to use one over the other.
Set type
One important thing to note with the Set
abstract collection type is that there are no duplicates. If you’re considering using a concrete implementation of Set
, such as a HashSet
or a TreeSet
, it is worth bearing this in mind.
HashSet
To understand a HashSet
, it can help to first understand what a hash table is. If you already know what one is, feel free to skip or gloss over the following section.
Hash tables
Hash tables are excellent for finding elements quickly. A hash table calculates an integer, known as the ‘hash code’, for every object within it. This is done using the instance fields from the object such that different objects would produce a different hash code.
When you create your own class, you’re responsible for creating (or generating) your own hash code. One important thing to bear in mind: your implementation of hashCode
must work with equals()
. If a.equals(b)
then a
and b
ought to have the same hash code.
Back to HashSet
The contains
method of a HashSet
is quick: you can quickly determine whether your HashSet
includes a particular object.
If you use the HashSet
iterator, it may seem as if the elements are being visited in a random order. The reason for this is that elements could be stored across a hash table. If you want to use a HashSet
, you should be content to make this trade-off.
TreeSet
What if you wanted a sorted set? Enter TreeSet
. Irrespective of the order in which you add the elements into the TreeSet
, they will come out in sorted order. Let’s take a look at an example:
We get back the following:
Finland
Nigeria
Peru
Spain
In other words, it’s sorted in alphabetical order.
A TreeSet
uses the tree data structure (a Red-Black tree). When a new element is added to a TreeSet
, it is added in the right place to ensure the TreeSet
always remains sorted. If you’re familiar with Red-Black trees, this should make sense. If not, take this on faith (or, better yet, look up Red-Black trees!).
Since the elements need to be ordered, you will have to provide a Comparator
if there is not one already for the type of object you want to store.
HashSet vs TreeSet
There are pros and cons to using a TreeSet
over a HashSet
. It takes a lot longer to add an element to a TreeSet
that it does inserting into a HashSet
. This is because you are paying for the added sorting overhead. If you want your elements to be sorted, a TreeSet
may be preferable.
It may be worth noting that there is a TreeMap
that is an implementation of a Map
if you need key-value pairs.
Photo by Ryan Hafey on Unsplash