This might sound like a Twitter hashtag gimmick, but it is not. It is actually nothing to do with social media, at all. When hash functions are mentioned, a lot comes into scope ranging from security to integrity, to crypto. In fact, you use hash functions each day, and you are right now. It is one of the things your computer did while you connected to this blog; a *TLS handshake *– fancy as it may seem. In this article, we shall properly segment the hash function concept into understandable parts with the aim of building an overall understanding – ‘*hash functions demystified*‘.

To understand hash functions, one must accept to take in the meaning some few terms which include; *functions, map, size and code.* All these terms are as used in computer science, not other subjects. A *function* is a callable unit of code and *code* is a set of instructions forming a computer program executable by a computer. By ‘callable’, the note here is; *a block of code that can be called using an expression.* Let us use a simple, universal example, alright?

Please
note that you may need to know a bit of programming to understand this.
If you don’t, do not worry because this is not a programming article.
Check the example below:

*# …using** python** programming language for this example*

*#Defining the function*

define test_function(t):

return t+5

*#Calling the function*

test_function(t)

Next terms that need understanding; *map *and* size.* A map is a data element structure between two distinct data models while *size*
means the number of bits specific data is composed of. With those few
definitions, you can now fully understand the meaning of a hash
functions.

A **hash function** is a *function that can be used to map data of arbitrary size to data of a fixed size.*
It could be tempting to leave the definition as is but let us expound
on it a little further. Assume you have data (1 GB) in a flash disk, and
you decide to use a hash function on it. The product after using the
hash function – you get data that is 256 bits. You then get more data
(10 GB) and you take it though the same hash – you’ll get the same-size
256 bit output. How this works may seem out of the box but, it really is
a wonderful process that is quite simple. This article shall explain
just that.

Before we go ahead, I would like to strictly specify that hashing is
not encryption and encryption is not hashing. This is because of two
very significant characteristics of hashing; **determinism** and **irreversibility**.
This basically means that when hashing happens, the same input always
leads to the same output and you can never get the original input.
Encryption is reversible. If a hash function cannot meet at least these
minimum requirements, it is broken. Not useful.

It is also important to properly highlight the properties of a good
hash function starting with the two mentioned earlier. These
characteristics are dependent on the applications of the hash but are
important in all hash functions. **Determinism**

For
all hash functions; any specific input value must always result in the
same hash value. This excludes some scenarios but for the purposes of
this article, it shall stand as is.**Randomization**

For
a good hash function, an input value must be evenly mapped over the
hash output range and with the same probability. This is important
because guessing the value of the input should prove difficult.**Non-invertibility**

Given the hashed value h(x), it would be unfeasible to reconstruct the input x in a reasonable amount of time.

Hash functions can be grouped into three distinct categories based on their purpose: *indexing, integrity* and *cryptographic* hash functions. Let us consider each of the categories.

**Indexing**

This kind of hash is used to improve
performance by grouping values of equal or similar parts in collective
operations. This in turn improves fundamental tasks such as searching.
For instance, each file stored in a computer can be pre-computed into a
hash and stored in an array. Once a search is performed, the array is
searched instead of the files storage. This reduces the time
significantly. These hash functions are generally specialized and have
high performance. An example of such a hash is Java’s *hash code.*

**Integrity**

Consider a scenario where you are
sending files to a pal over the internet. It is however paramount that
the files sent be the exact same copy because accidental corruption can
occur. One of the ways you could ensure that the file is the exact copy
is to send them several times and let your pal confirm that they are all
the same. This would be impractical while sending files that are say; 1
TB. Is it even economically feasible to do that?

To counter such a
bottleneck, integrity hashes (redundancy hashes) are used. You send a
file with its hash and your pal just needs to compare the hashes once he
/ she computes the hash of the file you sent them. An example of this
type of hash is the CRC32. It is important to note that CRC32 is broken.

A redundancy hash has to have a high avalanche coefficient, work with all types of data and have a high performance.

**Cryptography**

Cryptographic hash functions are very secure and **hard**
to break. Emphasis is put on making sure that they are irreversible and
also have a very high avalanche coefficient. Their applications include
but are not limited to; password storage and blockchain technologies.
They are also resistant to attacks and collision, plus they should be
universal in the nature of data they work with.

**So, how do hash functions work? **

For us to understand the working of hash functions, we could consider
as simple one and you can build your knowledge from there; the *mode hash function.*

int h(int x) { ** return** x *% 10;*

}

This function maps any integer onto one of the indexes of an array of ten slots. Remember that **array indexing** starts at zero (0). See the image below.

The **%** symbol is used to mean *modulus* as in
mathematics – the remainder after division. If you have a number 3 and
you divide it by 2, you get 1 remainder 1. This expression can be
represented as;

3 % 2 = 1.

This is the logic we shall use with our *mod hash function. *

With our function, the divider we shall use is 10 therefore consider a value *x*. To store it in our array, we store it at index *x % 10. *

Consider searching a value where** x = 392** using the same function.**392 % 10 = 2**

Therefore,
all you have to do is check if the value in index 2 is equal to 392. If
it is not, then there is no such value stored in the array.

This example can form a basis for how hash functions work and with it, you can understand truly more complex hash functions – over time.

A hash function alone cannot be used for purposes that are
commercially viable. The main reason is because hash functions always
take in data at a fixed length. 1 GB of data has to be divided into
several chunks for it to be digested by the hash function. To implement
this, **hashing algorithms** are used. There are many
different types of hashing algorithms but let us consider one of the
most popular; SHA – Secure Hashing Algorithm. From this algorithm, there
are different variations; SHA-1, SHA-2, SHA-3… Let us stick with SHA-1.

SHA-1 takes in data in chunks of 512 bits. Consider then an input
that is 1024 bits long. The message shall have to be divided into two,
then run through the hash function. That is not all though. If the
products of the hashing were to be taken as is plainly and output, then
they could become compromised. An extra step is added to the process to
ensure security and this is called **padding**. In this
technique, the message is divided into data blocks and then the hash
function is repeated as many times as the data blocks.

Now you are in the clear, you are wiser.