Hashing algorithms have been in the tech news a lot lately. What are they and why is everyone trying so hard to break them?

The latest news concerns the SHA-1 algorithm, which has been declared dead now that a team of researchers from the Centrum Wiskunde & Informatica (CWI) institute and Google Research have found a way to create two documents containing different content that generate the same hash. In crypto terms, this is called a collision.

The reason this is big news is because about 20% of websites that use certificates are still using SHA-1. If yours is one of them, then your administrators have got some scrambling to do.

Let’s back up and explain briefly what hashing is, how it differs from cryptography and why collisions are such a big deal. A hashing algorithm disguises input text by running it through a filter that turns it into an unintelligible string of gibberish, with all strings usually being the same length. It does this by adding a random string of data called a “salt” to the front or back of the password. The password plus the salt are then run through the hash algorithm to create a unique character string.

The authentication system can then store the salt plus the hash instead of the password to validate access attempts. Any time there’s a login attempt, the salt is applied to the password that’s entered and the resulting “salted hash” is compared to the one stored in the password table. If they match, then the password is valid and access is granted. If they don’t, the password is rejected.

The beauty of salted hashing is that it enables password authentication to work without requiring that the password itself be stored. Once the salted hash is created, the password can even be thrown away. Anyone who steals the password file only gets a bunch of gibberish characters that are nearly impossible to decode*. Even if multiple accounts use the same password, the randomly generated salts ensure that the hash values are different.

So why does this matter to you? Because hashing systems only work if no two strings of code can produce the same salted hash code, an event that’s called a collision. That’s why security researchers work so hard to find weaknesses that enable collisions to happen. The thinking is that if the good guys can find a weakness first, they can warn everybody before the bad guys have a chance to do any damage.

Security researchers first produced a collision in SHA-1’s predecessor, MD5, in 2005. They used brute-force methods to create two different password input strings that produced the same salted hash in as little as one minute using a basic laptop computer.

At that time, SHA-1 was suspected of also being vulnerable, but no one had yet successfully produced a hash that created a collision. All that changed in February with the publication of a paper by CWI Institute and Google Research that described in detail how a collision had been induced.

Bottom line: If your authentication system uses SHA-1 or MD5, you’re at risk of being breached.

You might wonder why these vulnerabilities are being discussed more than a decade after their existence first came to light. The answer is part technology, part human nature.

Switching from one hashing algorithm to another isn’t a trivial task. There are issues of backward compatibility with systems that use old hashing algorithms. Administrators must, in effect, catch every instance that uses the old algorithm and modify it. It’s time-consuming drudge work, and it’s tempting for busy admins to work on more pressing projects.

Then there’s the risk/reward tradeoff. The CWI/Google team said they committed 6,500 years of CPU computation and 110 years of GPU (graphical processing unit) computation to completing the two phases of the SHA-1 attack. They estimated that it would have cost about $110,000 worth of Amazon Web Services resources to duplicate the computing power they brought to the task. Since no one but the most determined and well-funded criminal enterprises or governments would commit those kinds of resources, it’s tempting to just hope for the best. Now that a compromise has been published, however, smarter attacks will follow.

That’s why even patches for severe vulnerabilities can take years to percolate through the user community. As recently as mid-2015 there were reports that MD5 was still in widespread use. When 200 million Yahoo credentials went up for sale online last summer, it turned out that they had been protected using MD5. That breach occurred in 2012, seven years after the first vulnerabilities were reported.

In other words, things don’t change nearly as quickly as we would like to believe. That’s why vendors are trying to push the issue this time. Google said it will publish the source code for creating an SHA-1 collision next month, along with protections for Gmail and GSuite users that defend against use of their collision technique. Security experts recommend switching to SHA-2 or one of its companions.

Ask your email administrator which hashing algorithm your company is using. It it’s SHA-2 or higher, you’re in good shape. If you’re greeted by a blank stare, well, you have bigger problems.

*If you think hashing sounds a lot like encryption, you’re right. The approaches are similar, but the intended outcomes aren’t. The main difference is that encrypted data is intended to be decrypted at some point, which is why keys are used. In contrast, hashed data is never intended to be decrypted.