The Under-Appreciated Hash Function–Illustrated using a Birthday Lookup
Considering the amount of time they have saved mankind, Hash-functions are among the most under-appreciated gems of computing. A hashing function will typically, using just a SINGLE lookup – retrieve a deeply buried patient record or a stock quote or a bank balance. Just a single lookup! It is the FASTEST structure possible when it comes to looking up (searching for) stuff (after all, it is hard to do better than JUST ONE lookup). The non-hash structures used for searching for stuff, typically take much longer. A hash function saves not just minutes but HOURS on every single search.
Consider the simplest example of the birthday problem – given a day and a month – you need to look up the name of a person who has a birthday on that date.
For simplicity, let us assume that there are only 365 people in our data set and each one has a unique birthday.
The LinkedList Solution:
If you were to use a linked list to store each person’s name and birthdate, you would have to visit an average of 183 nodes to perform this simple lookup.
The Hashtable Solution:
With a hashtable, this average is reduced to ONE!
That’s right – just ONE direct lookup retrieves the appropriate person’s name – all the lookup does is take the input (day and month) – computes its hash (which gives it a number between 1 and 365) – and uses that as an index into an array containing the names. So – if my birthday is on the 14th of Feb (which happens to be the 45th day of the year) – the hashing function should correctly return the number 45. My storage array’s 45th element – Array[45] – should contain my name. No traversing down any list – just one simple, straight to the point, lookup.
What is the catch? The trick, of course, lies in finding a way to map all those days and months to a unique number.
Somehow – we need to go from two numbers (the MONTH – running from 1 through 12 – and the DAY – running from 1 through 31) – and come up with a single number running from 1 through 365 (or 0 through 364 to appease arrays in most languages).
Here is a simple way to accomplish this feat.
Step 1 : Store a simple static array containing 12 entries – for the days elapsed when you reach the FIRST of each month. So for January 1st – this entry would be 0 for February 1st, it would be 31 (since 31 days have elapsed before you get to Feb 1st), for March 1st, it would be 60. And so on for each of the 12 months.
Step 2: Given a birthday (month and day) – say Feb 14th – all you do is look up the February days elapsed from step 1 (which is 31) – and add 14 to it – to give you 45. For March 10th, you would take 60 and add 10 to it – to give you 70. All we are saying here is that March 10th is the 70th day of the year and Feb 14th is the 45th day of the year.
int CreateHashUsingDayandMonth(int day, int month){int[12] days_elapsed = {0, 31, 60, 121, 152, 182, 213, 244, 274, 305, 335};
return days_elapsed[month] + day
}This is an example of a perfect hashing function – since we assumed our data set was nice enough that there were no duplicates. If we have duplicates, we run into ‘collisions’ which are the norm rather than the exception. More on those in a future post.
For now, the beauty of the simple hashing function above is that we have reduced the average number of lookups from 183 to 1 !
Don’t you think that hashes (hash functions) are under-appreciated? When was the last time you thanked a hash function?
Leave a Reply