10.3. Sample Hash Functions — CS3 Data Structures & Algorithms (2024)

10.3.1. Sample Hash Functions

10.3.1.1. Simple Mod Function

Consider the following hash function used to hash integers to a tableof sixteen slots.

  • Java
  • C++
 int h(int x) { return x % 16; }
 int h(int x) { return x % 16; }

Here “%” is the symbol for the mod function.

Settings

10.3. Sample Hash Functions — CS3 Data Structures & Algorithms (1) Saving... 10.3. Sample Hash Functions — CS3 Data Structures & Algorithms (2)
Server Error
Resubmit

Recall that the values 0 to 15 can be represented with four bits(i.e., 0000 to 1111).The value returned by this hash function depends solely onthe least significant four bits of the key.Because these bits are likely to be poorly distributed(as an example, a high percentage of the keys might be even numbers,which means that the low order bit is zero),the result will also be poorly distributed.This example shows that the size of the table \(M\)can have a big effect on the performance of a hash system because the table sizeis typically used as the modulus to ensure that the hashfunction produces a number in the range 0 to \(M-1\).

10.3.1.2. Binning

Say we are given keys in the range 0 to 999, and have a hash table ofsize 10.In this case, a possible hash function might simply divide the keyvalue by 100.Thus, all keys in the range 0 to 99 would hash to slot 0, keys 100 to199 would hash to slot 1, and so on.In other words, this hash function “bins” the first 100 keys to thefirst slot, the next 100 keys to the second slot, and so on.

Binning in this way has the problem that it will clustertogether keys if the distribution does not divide evenly on thehigh-order bits.In the above example, if more records have keys in the range 900-999(first digit 9) than have keys in the range 100-199(first digit 1), more records will hash to slot 9 than to slot 1.Likewise, if we pick too big a value for the key range and the actualkey values are all relatively small, then most records will hash toslot 0.A similar, analogous problem arises if we were instead hashing strings basedon the first letter in the string.

Settings

10.3. Sample Hash Functions — CS3 Data Structures & Algorithms (3) Saving... 10.3. Sample Hash Functions — CS3 Data Structures & Algorithms (4)
Server Error
Resubmit

In general with binning we store the record with key value \(i\)at array position \(i/X\) for some value \(X\)(using integer division).A problem with Binning is that we have to know the key range so thatwe can figure out what value to use for \(X\).Let’s assume that the keys are all in the range 0 to 999.Then we want to divide key values by 100 so that the result is in therange 0 to 9.There is no particular limit on the key range that binning couldhandle, so long as we know the maximum possible value in advance sothat we can figure out what to divide the key value by.Alternatively, we could also take the result of any binningcomputation and then mod by the table size to be safe.So if we have keys that are bigger than 999 when dividing by 100, wecan still make sure that the result is in the range 0 to 9 with a modby 10 step at the end.

Binning looks at the opposite part of the key value from the modfunction.The mod function, for a power of two, looks at the low-order bits,while binning looks at the high-order bits.Or if you want to think in base 10 instead of base 2, modding by 10 or100 looks at the low-order digits, while binning into an array of size10 or 100 looks at the high-order digits.

As another example, consider hashing a collection of keys whose valuesfollow a normal distribution, as illustrated byFigure 10.3.1.Keys near the mean of the normal distribution are far more likelyto occur than keys near the tails of the distribution.For a given slot, think of where the keys come from within the distribution.Binning would be taking thick slices out of the distribution and assignthose slices to hash table slots.If we use a hash table of size 8, we would divide the key range into 8equal-width slices and assign each slice to a slot in the table.Since a normal distribution is more likely to generate keys fromthe middle slice, the middle slot of the table is most likely to be used.In contrast, if we use the mod function, then we are assigning to any givenslot in the table a series of thin slices in steps of 8.In the normal distribution, some of these slices associated with any givenslot are near the tails, and some are near the center.Thus, each table slot is equally likely (roughly) to get a key value.

Figure 10.3.1: A comparison of binning vs. modulus as a hash function.

10.3.1.3. The Mid-Square Method

A good hash function to use with integer key values is themid-square method.The mid-square method squares the key value, and then takes out the middle\(r\) bits of the result, giving a value in the range0 to \(2^{r}-1\).This works well because most or all bits of the key value contribute tothe result.For example, consider records whose keys are 4-digit numbers in base10, as shown in Figure 10.3.2.The goal is to hash these key values to a table of size 100(i.e., a range of 0 to 99).This range is equivalent to two digits in base 10.That is, \(r = 2\).If the input is the number 4567, squaring yields an 8-digit number,20857489.The middle two digits of this result are 57.All digits of the original key value(equivalently, all bits when the number is viewed in binary)contribute to the middle two digits of the squared value.Thus, the result is not dominated by the distribution of the bottomdigit or the top digit of the original key value.Of course, if the key values all tend to be small numbers,then their squares will only affect the low-order digits of the hash value.

Figure 10.3.2: An example of the mid-square method. This image shows thetraditional gradeschool long multiplication process. The valuebeing squared is 4567. The result of squaring is 20857489.At the bottom, of the image, the value 4567 is show again, witheach digit at the bottom of a “V”. The associated “V” is showingthe digits from the result that are being affected by each digit ofthe input. That is, “4” affects the output digits 2, 0, 8, 5,an 7. But it has no affect on the last 3 digits. The key point isthat the middle two digits of the result (5 and 7) are affected byevery digit of the input.

Here is a little calculator for you to see how this works.Start with ‘4567’ as an example.

10.3.2. A Simple Hash Function for Strings

Now we will examine some hash functions suitable for storing stringsof characters.We start with a simple summation function.

  • Java
  • C++
 int sascii(String x, int M) { char ch[]; ch = x.toCharArray(); int i, sum; for (sum=0, i=0; i < x.length(); i++) { sum += ch[i]; } return sum % M; }
 int sascii(String x, int M) { char ch[]; ch = x.toCharArray(); int i, sum; for (sum=0, i=0; i < x.length(); i++) sum += ch[i]; return sum % M; }

This function sums the ASCII values of the letters in a string.If the hash table size \(M\) is small compared to theresulting summations, then this hash function should do agood job of distributing strings evenly among the hash table slots,because it gives equal weight to all characters in the string.This is an example of the folding method to designing a hashfunction.Note that the order of the characters in the string has no effect onthe result.A similar method for integers would add the digits of the keyvalue, assuming that there are enough digits to

  1. keep any one or two digits with bad distribution from skewing theresults of the process and

  2. generate a sum much larger than \(M\).

As with many other hash functions, the final step is to apply themodulus operator to the result, using table size \(M\) to generatea value within the table range.If the sum is not sufficiently large, then the modulus operator willyield a poor distribution.For example, because the ASCII value for ‘A’ is 65 and ‘Z’ is 90,sum will always be in the range 650 to 900 for a string of tenupper case letters.For a hash table of size 100 or less, a reasonable distributionresults.For a hash table of size 1000, the distribution is terrible becauseonly slots 650 to 900 can possibly be the home slot for some keyvalue, and the values are not evenly distributed even within thoseslots.

Now you can try it out with this calculator.

10.3.3. String Folding

Here is a much better hash function for strings.

  • Java
  • C++
// Use folding on a string, summed 4 bytes at a timeint sfold(String s, int M) { long sum = 0, mul = 1; for (int i = 0; i < s.length(); i++) { mul = (i % 4 == 0) ? 1 : mul * 256; sum += s.charAt(i) * mul; } return (int)(Math.abs(sum) % M);}
// Use folding on a string, summed 4 bytes at a timeint sfold(String s, int M) { long sum = 0, mul = 1; for (int i = 0; i < s.length(); i++) { mul = (i % 4 == 0) ? 1 : mul * 256; sum += s.charAt(i) * mul; } return (int)(Math.abs(sum) % M);}

This function takes a string as input.It processes the string four bytes at a time, and interprets each ofthe four-byte chunks as a single long integer value.The integer values for the four-byte chunks are added together.In the end, the resulting sum is converted to the range 0 to\(M-1\) using the modulus operator.

For example, if the string “aaaabbbb” is passed to sfold,then the first four bytes (“aaaa”) will be interpreted as theinteger value 1,633,771,873,and the next four bytes (“bbbb”) will beinterpreted as the integer value 1,650,614,882.Their sum is 3,284,386,755 (when treated as an unsigned integer).If the table size is 101 then the modulus function will cause this keyto hash to slot 75 in the table.

Now you can try it out with this calculator.

For any sufficiently long string, the sum for the integerquantities will typically cause a 32-bit integer to overflow(thus losing some of the high-order bits) because the resultingvalues are so large.But this causes no problems when the goal is to compute a hash function.

The reason that hashing by summing the integer representation of fourletters at a time is superior to summing one letter at a time is becausethe resulting values being summed have a bigger range.This still only works well for strings long enough(say at least 7-12 letters), but the original method would not workwell for short strings either.There is nothing special about using four characters at a time.Other choices could be made.Another alternative would be to fold two characters at a time.

10.3.4. Hash Function Practice

Now here is an exercise to let you practice these various hashfunctions.You should use the calculators above for the more complicated hashfunctions.

10.3.5. Hash Function Review Questions

Here are some review questions.

10.3. Sample Hash Functions — CS3 Data Structures & Algorithms (2024)
Top Articles
The Best Way to Import SVGs in React
When Will the Ripple v. SEC Lawsuit End? Lawyer Chips In
Katie Pavlich Bikini Photos
Gamevault Agent
Hocus Pocus Showtimes Near Harkins Theatres Yuma Palms 14
Free Atm For Emerald Card Near Me
Craigslist Mexico Cancun
Hendersonville (Tennessee) – Travel guide at Wikivoyage
Doby's Funeral Home Obituaries
Vardis Olive Garden (Georgioupolis, Kreta) ✈️ inkl. Flug buchen
Select Truck Greensboro
Things To Do In Atlanta Tomorrow Night
Non Sequitur
How To Cut Eelgrass Grounded
Pac Man Deviantart
Alexander Funeral Home Gallatin Obituaries
Craigslist In Flagstaff
Shasta County Most Wanted 2022
Energy Healing Conference Utah
Testberichte zu E-Bikes & Fahrrädern von PROPHETE.
Aaa Saugus Ma Appointment
Geometry Review Quiz 5 Answer Key
Walgreens Alma School And Dynamite
Bible Gateway passage: Revelation 3 - New Living Translation
Yisd Home Access Center
Home
Shadbase Get Out Of Jail
Gina Wilson Angle Addition Postulate
Celina Powell Lil Meech Video: A Controversial Encounter Shakes Social Media - Video Reddit Trend
Walmart Pharmacy Near Me Open
A Christmas Horse - Alison Senxation
Ou Football Brainiacs
Access a Shared Resource | Computing for Arts + Sciences
Pixel Combat Unblocked
Cvs Sport Physicals
Mercedes W204 Belt Diagram
Rogold Extension
'Conan Exiles' 3.0 Guide: How To Unlock Spells And Sorcery
Teenbeautyfitness
Where Can I Cash A Huntington National Bank Check
Facebook Marketplace Marrero La
Nobodyhome.tv Reddit
Topos De Bolos Engraçados
Gregory (Five Nights at Freddy's)
Grand Valley State University Library Hours
Holzer Athena Portal
Hampton In And Suites Near Me
Stoughton Commuter Rail Schedule
Bedbathandbeyond Flemington Nj
Free Carnival-themed Google Slides & PowerPoint templates
Otter Bustr
Selly Medaline
Latest Posts
Article information

Author: Golda Nolan II

Last Updated:

Views: 6447

Rating: 4.8 / 5 (78 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Golda Nolan II

Birthday: 1998-05-14

Address: Suite 369 9754 Roberts Pines, West Benitaburgh, NM 69180-7958

Phone: +522993866487

Job: Sales Executive

Hobby: Worldbuilding, Shopping, Quilting, Cooking, Homebrewing, Leather crafting, Pet

Introduction: My name is Golda Nolan II, I am a thoughtful, clever, cute, jolly, brave, powerful, splendid person who loves writing and wants to share my knowledge and understanding with you.