Hashing – Theory

Resource:

*MIT Video Lecture – Hashing with Chaining
MIT Video – Table Doubling, Karp-Rabin
Open addressing, cryptographic hashing

Hashmap best and average case for Search, Insert and Delete is O(1) and worst case is O(n). Link
HashMap get/put complexity –
It’s usually O(1), with a decent hash which itself is constant time. In the worst case, a HashMap has an O(n) lookup due to walking through all entries in the same hash bucket (e.g. if they all have the same hash code). Fortunately, that worst case scenario doesn’t come up very often in real life, in my experience. So no, O(1) certainly isn’t guaranteed – but it’s usually what you should assume when considering which algorithms and data structures to use.  In JDK 8, HashMap has been tweaked so that if keys can be compared for ordering, then any densely-populated bucket is implemented as a tree, so that even if there are lots of entries with the same hash code, the complexity is O(log n).
Hashmaps are O(n/m) in average, if n is the number of items and m is the size.

Link  How does Java handles hash collision internally-  
Prior to Java 8, HashMap and all other hash table based Map implementation classes in Java handle collision by chaining, i.e. they use linked list to store map entries which ended in the same bucket due to a collision. If a key end up in same bucket location where an entry is already stored then this entry is just added at the head of the linked list there. In the worst case this degrades the performance of the get() method of HashMap to O(n) from O(1). In order to address this issue in the case of frequent HashMap collisions, Java8 has started using a balanced tree instead of linked list for storing collided entries. This also means that in the worst case you will get a performance boost from O(n) to O(log n). The threshold of switching to the balanced tree is defined as TREEIFY_THRESHOLD constant in java.util.HashMap JDK 8 code.
Currently, it’s value is 8, which means if there are more than 8 elements in the same bucket than HashMap will use a tree instead of linked list to hold them in the same bucket. 

IITD Video Lecture: 1, 2
Read theory from book – Data Structure and Algo using Python(Ch 11) 

“Describe a simple implementation of a hash table.”

Read the blog here. At first blush this seems a simple enough request. My python brain immediately jumped to the thought, “well of course dict is python’s implementation of a hash table…”, but given a world where dict doesn’t exist, how would you define it’s behavior? In the simplest terms possible, a hash table is a data structure that maps keys to values. Given that an array-like/indexable data type exists, we can simply use a hashing function to generate an index for a given key and insert the value into an array-like structure at that generated index.


To prevent/handle collision there are two basis technique:

  1. Separate Chaining
  2. Open Addressing : Also knows as Closed Hashing. There are three type for this-

a) Liner Probing

Let hash(x) be the slot index computed using hash function and S be the table size

If slot hash(x) % S is full, then we try (hash(x) + 1) % S
If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S

This is the technique used to store elements in the hash table even with collision. When inserting the data we need to be careful that the table is not full at point of time or the index has been exceeded.
We stop searching when we find an empty location.We need to be careful when deleting the elements. We can use tombstone/flag to mark that the element was deleted from that position. When searching for any element we don’t stop at the flag but move across it but when inserting we will be setting new values in the flag position. If the number of flag in the table increases we can rehash the entire table to reclaim the wasted space or this can be done if the table is small as compared to the no. of elements.
It uses less memory then the chaining method (where the elements that collide are stored in the form of linked list). It’s slower than chaining as the one might have to walk along the table for a long time.

b) Quadratic Probing:

We look for i2th slot in i’th iteration.

let hash(x) be the slot index computed using hash function.  
If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
..................................................

c) Double Hashing:

We use another hash function hash2(x) and look for i*hash2(x) slot in i’th rotation.

let hash(x) be the slot index computed using hash function.  
If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S
..................................................
Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s