The new v0.4.0 release of the OpenTimestamps Client includes an interesting example: one timestamp proof for two different files. This means you can do the following:

$ cd examples/sha1
$ diff a b
Binary files a and b differ
$ ots verify -f a a_or_b.ots
Success! Bitcoin attests data existed as of Wed May 17 21:45:45 2017 EDT
$ ots verify -f b a_or_b.ots
Success! Bitcoin attests data existed as of Wed May 17 21:45:45 2017 EDT

As you probably already guessed, this has something to do with the SHA1 cryptographic hash function. The a_or_b.ots timestamp proof hashes its input with SHA1:

$ ots info a_or_b.ots | head -n 1
File sha1 hash: f92d74e3874587aaf443d1db961d4e26dde13e9c

…and while files a and b are different, they have the same SHA1 digest:

$ sha1sum a b
f92d74e3874587aaf443d1db961d4e26dde13e9c  a
f92d74e3874587aaf443d1db961d4e26dde13e9c  b

This is possible because SHA1 is broken: it’s vulnerable to collision attacks, allowing an attacker finds two different messages with the same SHA1 digest. For many cryptographic protocols collision attacks are a disaster. For example, since Git identifies source code by SHA1 hash, a software vendor could send an security auditing team a clean version of a software package, get the auditors sign off on the software being secure, then secretly send a backdoored version of the software with the same SHA1 digest to unsuspecting users.

To make a long story short, cryptographers have been recommending against using SHA1 for over a decade, prior to Git’s initial release in fact.

So why are SHA1 operations allowed in OpenTimestamps proofs if SHA1 is broken?

Remember that a timestamp proof simply proves that a message existed prior to a particular time in the past. Fortunately for us, SHA1 is only vulnerable to collision attacks, not preimage attacks.

The difference is that in a collision attack, the attacker is finding two different messages, \(m_1\) and \(m_2\), that have the same digest. While there are a variety of different types of collision attacks, they all share something in common: when \(m_1\) and \(m_2\) are found, they’re found at the same time.

Thus our SHA1-using timestamp proof is secure: we know a and b existed at the same time, prior to when the Bitcoin block was created.

Preimage Attacks

So what would it take for a timestamp proof to be insecure? A preimage attack.

Here the attacker starts with a existing digest \(d\), and has to find a new message \(m\) that has the same digest. In the context of a timestamp proof, the new message \(m\) makes the proof invalid: it didn’t exist when the proof was created.

Fortunately, for a variety of reasons the underlying math makes preimage attacks much more difficult than collision attacks. As Zooko Wilcox notes in the History of Hash Function Attacks while collision attacks have been found for a large number of well-known cryptographic hash functions (albeit all hash functions designed prior to 1998), the only well-known cryptographic hash function has ever been found to be vulnerable to a preimage attack is Snefru-2, designed in 1990.

SHA1 and Security Engineering

So does this mean we can just use SHA1 everywhere? It’s faster and smaller than SHA256 after all. Better yet, no-one’s found a preimage attack for MD5 - it’s even faster and smaller than SHA1!

Not so fast! There’s a lot of good reasons to be dubious about using SHA1. In fact, the only reason why OpenTimestamps supports SHA1 at all is for compatibility with SHA1-using systems like Git and the Internet Archive. If not for compatibility considerations OpenTimestamps would support exactly one hash function, SHA256; far more security vulnerabilities have been caused by bad code than bad math.

Secondly that SHA1 digests can collide breaks assumptions people make about hash functions. For example, it might be useful for a database of timestamp proofs to include an index that maps output digest to input message: SHA1 collisions break that assumption. While it’s good to avoid bad assumptions, it’s safer if you don’t have too. For this reason alone we may further restrict the use of SHA1 in the future in OpenTimestamps, perhaps relegating it to some kind of “safeties off” mode.

Finally, safety margins are a good thing. I sleep better at night knowing that the vast majority of OpenTimestamps proofs are vulnerable to breaks in exactly one hash function, SHA256. I know that if SHA256 is weak, it’s likely to fall to a collision attack first, giving me a few years figure out how to move to a better hash function. And I know that because of Bitcoin, there’s tremendous incentives to break that hash function; the fact that no-one has found a way to break it already is a good sign!