Hash Functions

- May 22, 2018

One way functions that transfer character inputs to compressed output value named as hash functions. The input can be infinite(arbitrary), but the output always has a finite(fixed) set of characters. This produces a fingerprint of the file/message/data

source: Wikipedia

You cannot reverse the function and get the input value in Hash functions. If you have the content, you can use the hash function to calculate the hash value, but the other way around is not possible. If you have the hash value and the function, you can’t use it to get the input text.

Below are sample scenarios that hash functions may become useless.

If the function is reversible, it’s not secure. Hash values can be exposed to the public. Hence we don’t want someone seeing the hash value.
The values may not collision resistant. There may be use cases with two unique input values return the same hash value. Since digital certificates/Signatures using hash functions, it will give the chance to cheat.

Below is a simple hash function.

H =h(M)

H= hash value

h=hash function

M=Input value

If it’s infeasible to find any x,y h(x)=h(y) we called them as strong collision resistance functions

For given x if it’s infeasible to find y, h(x)=h(y), the functions are weak collision resistance

Given H it’s infeasible to find h(x)=H is demonstrated, one way property of hash functions.

Let’s discuss some hash functions.

1. Message Digest 5 (MD5)

MD5 is the fifth edition of the MD algorithms. This produces the 128-bit hash and the hash is depend on the all message bits. This has been chipped away due to collision resistance feature. In 2013 crypt analysts discovered a technique to breaks md5 collision resistance in less than a second.

2. Secure Hash Algorithm (SHA)

Up to now, there are 3 versions of SHA’s are introduced. SHA-1, SHA-2, and SHA-3.

SHA-1 using 160-bit hash value and cryptanalysis has discovered the SHA-1 is not secure anymore. SHA-1 is employed in several widely used applications and protocols including Secure Socket Layer (SSL) security. Then they have introduced SHA-2, which contains six hash functions. This returns 224, 256, 384 and 512-bit hashes.

There are no known attacks against SHA-2 yet :) But there’s a definite risk of breaking SHA-2 because they are using the same mathematical methods used for SHA-1 and MD5. Therefore SHA-3 is introduced back in 2006. This uses completely different mathematical methods than other hash functions.

The hash length can be designed by the person who is creating the hash. Hence any desired length is possible.

Now here comes the politics into the cryptography. :D Since US government involved in the creation of the SHA-1, SHA-2 algorithms people tend not to trust the SHA family. Here’s the invention of RIPEMD algorithm.

3. Race Integrity Primitives Evaluation Message Digest - RIPEMD

This has been invented by Belgian open research community which is known as a family of European hash functions. This produces 128, 160, 256 and 320-bit hashes. But 160 bit hashes are using today.

4. Whirlpool

Whirlpool is block-cipher based 512-bit hash function which consists of three versions (WHIRLPOOL-0, WHIRLPOOL-T, and WHIRLPOOL)

	SHA-1	MD5	RIPEMD-160	Whirlpool
Digest length	160 bits	128 bits	160 bits	512 bits
Basic unit of processing	512 bits	512 bits	512 bits	512 bits
Number of steps	80 (4 rounds of 20)	64 (4 rounds of 16)	160 (5 paired rounds of 16)	10

Few of the applications of Hashing are password storage, data integrity test, and digital signatures.

Cryptocurrency, blockchain are most common user scenarios in hash functions.

Let’s discuss more use cases of Hashing.

1. HMAC - Hash-based Message Authentication Code

HMAC is more secure than MAC. Let’s discuss the MAC first.

Message Authentication Code - Using symmetric cryptography mechanism

MAC is a block of few bytes that is used to create a fixed size block. The size of the block may vary on the key and the message. Since the receiver can calculate the MAC value and check the integrity of the data, MAC doesn’t needed to be reversible.

Source: wikipedia

MAC can be calculated using below Formula

MAC = F(K, M)

K=Key

M=Message

Since key and the message hashed in different steps, HMAC is much secure than MAC. Using a hash function will give you several advantages like.

HMAC is much faster than MAC and also it keeps as the internet standard RFC2104. Using a secret handshake will ensure that message is not altered in the network.

Below function is using to generate the HMAC

HMAC(key, msg) = H(mod1(key) || H(mod2(key) || msg))

But still using a brute force attacks/birthday attacks can exploit the HMAC.

If there are >= 23 people in the room, there’s more than 50% chance to have the same birthday for two of them. If I’m considered me as a part of the pair, I need to have 253 to get the correct pair. In other way, I’m combining 253 people to make all the sets.

But if I’m considering only a random pair, then we need only 23 people in the room to make that 253 combination. 253 pair will increase the chance of matching the birthday of a pair, more than 50%

Below image I found on the internet, describes about the birthday paradox.