Hands-On High Performance Programming with Qt 5
上QQ阅读APP看书,第一时间看更新

Hash tables

When looking up elements, this data structure gives us an unparalleled O(1) performance! It uses a key, which is then encoded by a hashing function into an integer, which is in turn used to access a position in an array. It's instant access. We have only to translate our key into an index and remember which key denotes which data.

But, as always, there are also problems. The hashing function can from time to time produce the same index for different keys (this is called a collision). We can resolve that problem in two ways - either we form a list of elements for a given index (chaining), or we just look for another free place in the array, in the simplest case for the next one (open addressing). The first method results in bad data locality; the second is more complicated to implement and requires users to define a special value denoting a deleted element. However, because of such a good lookup performance, there is much research ongoing to optimize both those strategies, with open addressing generally held to be more performant.

The Standard Template Library (STL) uses the chaining implementation because the C++ standard requires pointer stability for an unordered_map. If you want to use the open addressing scheme Google's dense_hash_map is a well-known implementation and there is a recent F14 hash table from Facebook's Folly library. 

The second problem with hash tables is the handling of the internal array growth - when the hash table is automatically resized, all its elements have to be rehashed! Thus, we say that the O(1) search complexity is only an amortized complexity—with a choice of the right growth strategy, the reallocation costs counted together with the regular access times still give us the asymptotic O(1) complexity.

Another problem with hash tables is that for bad hashing functions the hash table will have too many collisions and will degenerate! For example, a chaining implementation could degenerate into a single list with search times back to the array's O(n)!