Hash Functions - Back-End Engineering Curriculum - Turing School of Software and Design (2024)

Learning Goals

  • Understand what a hash function is
  • Understand the key characteristics of hash functions that are used to hash passwords
  • Become familiar with two of the most common hash functions, MD5 and SHA-256

Vocabulary

  • Hash Function
  • Hash/Digest
  • Collision
  • Determinism

What is a Hash Function?

From Wikipedia: “A hash function is any function that can be used to map data of arbitrary size onto data of a fixed size”.

An important distinction to make here is that a Hash Function is different than a Ruby Hash. They are similar, but not the same thing.

In the following diagram (also from Wikipedia), we are using a Hash Function to take names and map them to numbers.

Hash Functions - Back-End Engineering Curriculum - Turing School of Software and Design (1)

Vocabulary

  • Collision - When two different inputs map to the same output. You can see this in the example above. Two names map to the same number.
  • Hash or Digest - The output of a hash function. There are other names, but those are the two we will use most often.

Key Characteristics of a Hash Function

  1. Deterministic - The same input will give the same output
  2. Collision Resistant - It is unlikely that different inputs will map to the same output
  3. Fixed Size Output - The output will always be the same size no matter the input

Uses for Hash Functions

Hash Functions have many uses, including but not limited to storing data (often called Hash Tables), Caching, and finding duplicate records. The SHA-1 hash function is used to create the git commit hash.

The most common and probably most important use of Hash Functions is for storing passwords.

We never ever ever store a user’s raw password in a database.

So instead, we store a hash of their password.

Key Characteristics for Password Hash Functions

  1. One Way - A hash can be easily computed for an input, but it is infeasible (note: not impossible), to reverse it. In the example above, this would mean that it would be easy to go from a name to a number, but infeasible to go from a number to a name.
  2. Discontinuous - Inputs that are similar do not have a similar output. In the example above, if we hashed the name “Sam Do”, it should not be close to the hashed value of 4 for “Sam Doe”.

This exploration will use the following two hash functions:

def hash_me_1(snack) snack.length % 6end
def hash_me_2(snack) hash1 = (snack.length * 13) % 7 hash2 = (snack.length * 2) % 6 ((hash1 + hash2) % 7).to_s + hash1.to_s + hash2.to_s;end

Process:

  1. Take a snack
  2. Using hash_me_1 (defined above), compute a digest based on the name of the snack (excluding spaces)
  3. Place your snack in the bin labeled with the digest you computed
  4. Discuss with the other people at your bin:
    • Does this hash function have the characteristics we discussed earlier?
  5. Repeat steps 2 - 4 for hash_me_2
  6. Eat snack

You’ve done some hash building in the physical space. Let’s practice doing it with the machine, using better algorithms.

We are going to experiment with two of the most common hash functions:

You should use the hexdigest method for computing digests for both MD5 and SHA-256.

Instructions

We’ll provide timing constraints on the board. If you’re moving quickly and feeling confident, complete the Extension questions. But if you’re not, make sure you complete all the non-Extension exercises first.

Activity 1: MD5 and SHA-256

Collaborating with the person next to you, experiment and discuss answers to these questions:

  1. What is the MD5 digest of your snack string?
  2. What is the Sha256 digest of the same string?
  3. Repeat both MD5 hashing and Sha256 hashing, but change the input to include a one-letter-off typo. What do you notice about the output? What if you add a blank space to the end of the string? What if you change the capitalization of one letter?
  4. Multiply your string 1000x (like “chipschipschipschips…” and hash it again through each algorithm. What’s notable about the output as compared to previous runs?

Then, in your own notebook, jot quick notes to solidify your learning:

  1. How does a small change to the input of a hash change the output?
  2. Why does the answer to "A" matter?
  3. How does a massive change to the size of the input change the output?
  4. Based on this analysis, can you come up with three potential use cases for hash functions *besides* passwords?

Extension

Feeling good about hashing? Try answering these questions:

  1. Which algorithm (MD5 / Sha256) is faster? Can you prove it using a dataset of at least 200 inputs and calculating the percentage speed difference?
  2. Is the percentage difference consistent if you increase the size of each individual input data by 100 times?
  3. Is there a scenario where you’d want to intentionally choose a slower algorithm? Why?

Activity 2: Bad Secrets

Let’s test your understandings by completing this section on your own. Use your pair as a resource if you get stuck, but try to complete the work on your own.

A hashing function is said to be “one-way” which is often useful in security, but it’s not fool-proof. Say you hack into my application and are able to retrieve all my users’ hashed passwords. You find that the account with username [email protected] has this hashed password:

3e40106b8f4332e18d76e94124d9c82a

Based on the length of the digest you guess it’s an MD5. You know that some users, particularly bosses, are lazy and they do dumb things like re-use their 4-digit ATM pin for their password. But the application required a password of eight digits, so they might have repeated the pin.

Work out answers to the following questions in your notebook:

  1. What’s this user’s password?
  2. Would the user’s password have been “more secure” if they used eight letters rather than eight numbers? Explain your thinking.

Extension

A “rainbow table” makes this reverse engineering much faster.

  1. Can you generate a CSV file that has two columns: the first column contains all possible 8-digit codes following the 4+4 rule above, then the second column has the MD5 digest for that input.
  2. If you now have an MD5 digest for an input that is expected to follow the 4+4 rule, how long does it take you to “crack” a password using the rainbow table?
  3. Could you generate a similar table for the all words of eight or more letters in the dictionary? (hint: you have a text file dictionary on your filesystem at /usr/share/dict/words)
  4. Some developers choose to make passwords more obscure by applying the hashing algorithm more than once (ie: original input into the algorithm, then that output into the algorithm again). Can you expand your dictionary table with columns for double-hashing, quad-hashing, and octo-hashing? As our time is limited, you might need to constrain your input set ;)
  • What is a Hash Function?
  • What is a Collision?
  • Define each of the following Hash Function characteristics and explain why they are important:
    • Deterministic
    • Collision Resistant
    • Fixed Sized Output
    • One Way
    • Discontinuous
Hash Functions - Back-End Engineering Curriculum - Turing School of Software and Design (2024)
Top Articles
Don’t say the wrong “OK” in French.
We are sorry, the page you requested cannot be found
Craigslist Free En Dallas Tx
Promotional Code For Spades Royale
Brady Hughes Justified
Kraziithegreat
St Petersburg Craigslist Pets
Txtvrfy Sheridan Wy
Ub Civil Engineering Flowsheet
Kent And Pelczar Obituaries
Gameday Red Sox
Hssn Broadcasts
Ree Marie Centerfold
7543460065
SXSW Film & TV Alumni Releases – July & August 2024
Hellraiser III [1996] [R] - 5.8.6 | Parents' Guide & Review | Kids-In-Mind.com
Weather Rotterdam - Detailed bulletin - Free 15-day Marine forecasts - METEO CONSULT MARINE
Florida History: Jacksonville's role in the silent film industry
Nick Pulos Height, Age, Net Worth, Girlfriend, Stunt Actor
Royal Cuts Kentlands
Yard Goats Score
Halo Worth Animal Jam
Aerocareusa Hmebillpay Com
Sullivan County Image Mate
Cincinnati Adult Search
Who is Jenny Popach? Everything to Know About The Girl Who Allegedly Broke Into the Hype House With Her Mom
Piedmont Healthstream Sign In
Fiona Shaw on Ireland: ‘It is one of the most successful countries in the world. It wasn’t when I left it’
Restaurants In Shelby Montana
Things to do in Pearl City: Honolulu, HI Travel Guide by 10Best
Shia Prayer Times Houston
Nurofen 400mg Tabletten (24 stuks) | De Online Drogist
Craigslist Hamilton Al
The Mad Merchant Wow
Rogers Centre is getting a $300M reno. Here's what the Blue Jays ballpark will look like | CBC News
What Does Code 898 Mean On Irs Transcript
Gold Dipping Vat Terraria
The best bagels in NYC, according to a New Yorker
Lake Kingdom Moon 31
Pulitzer And Tony Winning Play About A Mathematical Genius Crossword
Cocorahs South Dakota
Cabarrus County School Calendar 2024
3500 Orchard Place
Caphras Calculator
Sky Dental Cartersville
Sacramentocraiglist
Workday Latech Edu
4Chan Zelda Totk
Bradshaw And Range Obituaries
ESPN's New Standalone Streaming Service Will Be Available Through Disney+ In 2025
Prologistix Ein Number
Latest Posts
Article information

Author: Eusebia Nader

Last Updated:

Views: 6096

Rating: 5 / 5 (60 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Eusebia Nader

Birthday: 1994-11-11

Address: Apt. 721 977 Ebert Meadows, Jereville, GA 73618-6603

Phone: +2316203969400

Job: International Farming Consultant

Hobby: Reading, Photography, Shooting, Singing, Magic, Kayaking, Mushroom hunting

Introduction: My name is Eusebia Nader, I am a encouraging, brainy, lively, nice, famous, healthy, clever person who loves writing and wants to share my knowledge and understanding with you.