What is hashing and how does it work?
Hashing is a cryptographic method used to change datasets and characters of any length into compact, fixed hash values. In doing so, hashing offers more security than encryption since hash values cannot be converted back into their original values without the key. Hashing is used to manage and secure databases, user data, password management and access authentication.
What is hashing?
Hashing is an important cryptographic instrument used to convert data into hash values. In this case a special hash function is used, most often in the form of an algorithm. The main aim of hashing is as it sounds, it cuts things up. This is exactly what hashing does: Datasets such as passwords, company and user data and other forms of data is hashed and converted into a new shorter form, known as a hash value. In each hashing process, hash values are given the same length and represent original datasets. The hash values can then be stored in a more compact way as a hash table, meaning that less space is used.
The pros of hashing: Hash values cannot be converted back to their original values without a key. Even if criminals were able to access them, they couldn’t do anything with the hash values. People are often confused between hashing and encryption. Since hash values are not encrypted, rather converted to a completely new character string, they cannot be decrypted. To do so, you would need the corresponding key, the algorithm used and the original data which is associated with the hash values.
How does hashing work?
Hashing is carried out in three steps:
- Hash functions: A hash function is a cryptographic algorithm which cuts up data of different lengths completely, separates it and converts it to character strings of the same length. These hash values are much shorter and more compact than the original values.
- Hash values: Hash values are the result of the hash functions. Unlike the original values, as part of the hash method, they also have the same length of hexadecimal strings. The set length of the hash values depends on the process.
- Hash tables: Hash values can be stored in what’s known as a hash table. These require a lot less memory to store compared to the original datasets. To do this, datasets are given specific index values from hashing which shows where the data set is. This significantly shortens the time to process them, and the computing power required when looking for the information.
The way hashing works is based on five characteristics which guarantee security and reliability:
- Deterministic: The hash function must always create a set, shorter hash value from entries with different lengths.
- Unreadable: Hashing converts original values into hash values which are in themselves “unreadable”. There is no way to decrypt the hash values in the classic sense and recreate the original text or character string.
- Collision resistance: Different entries cannot be assigned the same hash value. If two original values get the same hash value, this is called a collision. Unchangeable and unique hash values reduce the chance of attack and increase security. However, depending on the situation, a collision may actually be desired.
- Continuity or non-continuity: In general, hash values offer more security if they are not continual. In other words, different original datasets contain as many different hash values as possible. However, continual hash values might be better if you are using hashing to manage similar datasets and entries.
- Speed: Hashing not only improves security, but it also allows for quicker access to databases.
Hashing applications
The uses and functions of hashing are particularly clear when we look at different applications and areas of use. Typical applications include:
- The creation of hash tables
- The “encryption” of important data
- The search for duplicates
- Checksums and digital signatures
- The search for similar data
- Authentication systems
- Caching
Database management
Hashing has a special benefit in that it allows data to be stored more compactly and efficiently in the form of hash tables. Character strings are summarized as hash values in the database as hash tables. This saves memory, improves security within the database and speeds up the search for certain entries in the hash tables. Hash values and hash tables improve the organization and management of index and data infrastructure.
Example: Customer databases often contain important information such as names, contact details or addresses of said customers. If a database needs to be searched for specific information, it will take a while with a normal search. This is because the whole database must be looked through to get the value you’re looking for. However, hashing enables data blocks to be created with a set address position in the database. This means a computer can jump directly to that position by using the hash value applied to it which is addressable in the hash table.
Digital signatures and checksums
Hashing also plays an important role as an authentication method. For example, it can be used to create digital signatures, also known as digital fingerprints. This means that the communication integrity between the sender and the receiver can be confirmed. It can also be used when creating a new user account and linking them to passwords. When creating new user accounts, a hash function creates a hash value for the password that’s been chosen. For any future logins, the password entered will be compared with the hash value using hashing. When a password is reset a new hash value will also be created for the new password.
Example: With digital signatures, you can check whether messages, downloads or even websites are secure. To do this, the sender creates a hash value from messages or delivers a hash value when downloading a program. The receiver then also creates a hash value using the same hash function. This is then compared with the mostly encrypted hash value that has been sent along with it. The best example of this on the web is SSL/TLS encryption. In this case, the web server sends a server certificate to the browser. Hashing then creates a session key that is received, decrypted and confirmed by the server. Following this authentication HTTPS data exchanges can take place. SFTP works in a similar fashion.
Passwords and other sensitive data
When saving sensitive data such as passwords, login and user data, hashing offers a high level of security. This is because they are not stored in their original form or “simply” encrypted in the database. Instead, the datasets are stripped down into hash values which cannot be worked out without the corresponding method, even when stolen. When entering a password, the password’s calculated hash value is then compared with the saved password. Hashing is also sometimes used in caches, making temporarily stored data such as websites or logins and payment data unreadable to those not authorized to see it.
Example: Saving diverse content such as text documents or audio and video files can be made more secure by hashing. The binary structure of files is converted into compact hash values which are referenced in the data block they belong to. Since hash values are linked to a position in the database, the data can be found faster, and they cannot be copied in their original form or read by cyber criminals without the corresponding key.
What are the pros of using hashing?
The pros of hashing at a glance:
- Sensitive data can be securely managed and takes up less space.
- The datasets converted to hash values cannot be returned in their original form without being “decrypted” again.
- Access to databases is quicker since hash values are associated with positions within the database.
- Any stolen hash values cannot be used by attackers without the corresponding technologies or information about the hash function.
- The secure exchange of data, messages or software can be authenticated or signed by using hashing.
Hashing and blockchains
Hashing and hash functions are central components in a blockchain. To authenticate transactions using crypto currencies such as Bitcoin, hashes are created during mining. Bitcoin, for example, uses the SHA-256 hashing algorithm, converting any length of character string into a set one, in other words, a hash, with 64 strings. They can then be used to legitimize, authenticate and document official crypto transactions, store them in the blockchain and ensure a higher level of security.
Hashing has three primary functions within the blockchain:
- Mining: Mining on a crypto network is also known as a hash rate. It shows how many miners are active. Miners create hashes by completing mathematical calculations. If the hash is valid, a transaction block will be validated. The higher the hash rate, the more coins or tokens are created. Crypto mining is, therefore, based on transaction hashing algorithms.
- Blockchain: Recorded and validated transactions are documented sequentially as blocks. They are added to the blockchain as part of the mining process. Each block is then linked to the one before it and contains the hash value of the previous block. This prevents an invalid or damaged block from being added.
- Key generation: Hashing is also used to transfer crypto currencies. This means that it is authenticated via hashing using a public and private key.
How safe is hashing?
Experts generally agree that using hashing is one of the safest ways to store databases and sensitive data. Hashing is not like standard encryption since hash values do not contain any information about the original data sets and cannot be “decrypted”. Even brute force attacks which work by trying to use a combination of strings until a match is found until they hit a match would need an astronomical number of tries to work.
Some bad actors also use lists of stolen hash values to compare them with rainbow tables. These are lists of stolen hash values and the access data assigned to them. If a hash value from the database matches with a hash value in the rainbow table and the assigned password, it creates a security gap. This is why it’s important, even when using hashing to regularly change passwords, carry out regular updates and use up-to-date hashing algorithms. The Internet Engineering Taskforce (IETF) recommended the following hashing algorithms in 2021:
- Argon2
- Bcrpt
- Scrypt
- PBKDF2
Another way to make hashing even more secure is to use encryption processes such as salting and peppering. With salting, every password transferred to a hash value has an additional, randomly created character string. By using salts of at least 16 characters make brute force attacks practically impossible and offers another level of reliable security. If a 32-character code called “pepper” is then added to all passwords, any stolen hash values with the added salt would be difficult to crack.