|MD5-based SSL: totally, completely broken.
||[Jan. 10th, 2009|11:00 pm]
I think I'm going to keep teaching Crypto-in-plain-English on this blog for a while yet, so let's talk current events.|
To welcome in the new year, a group at the TU Eindhoven in the Netherlands successfully attacked SSL in a rather serious way: they created a certificate authority which every browser trusts, which could make anyone claim to be, for example, HSBC or the Bank of America -- or whoever you want. SSL is theoretically supposed to protect against that, but it has completely failed to do so.
SSL is that technology that, among other things, puts the padlock icon on your browser. Browsers try to present these visual cues to tell you that there is encryption going on "under the hood," so that someone can't just overhear your credit card number. The TUE group's live demo at the hacking conference was an open wireless router that could overhear anything you tried to transmit through it over SSL -- but all of the padlocks and browser notifications were still there.
It's called a man-in-the-middle attack. Suppose Alice wants to securely contact Bob. A malicious attacker named Mallory says "Hi, I'm Bob," and presents valid IDs and credentials to Alice. They begin a secure communication with each other. Then, Mallory impersonates Alice in a similar way when he opens a secure communication line to Bob. By sitting between these two secure communication lines, Mallory gets the entire communication recorded in his logs.
The TUE group falsified the identification of SSL: they devised a way to say "Hi, I'm HSBC" or "Hi, I'm Bank of America" and get completely trusted by the browser. They did it by breaking a hash function called MD5.
(LJ-cut text: Read on...)
I've blogged here about hash functions before: these are deterministic data-scramblers, which scramble one big message M (which could be a file, string, or whatever) into a "small" unique number h(M). The general idea of a hash function, to remind you, is that, if you have two messages M and M', then:
h(M) = h(M') if and only if M = M'In theory, hash functions are impossible. Because every number can be treated as a message. If you define S as the set of all "small" numbers, and N as the amount of numbers in S, you straightforwardly run into a paradox: treat every number in S as messages, and hash all of them. Then hash some other number. You now have N+1 hash values, but you only have N possible "small" outputs. Either at least one output isn't small, or at least one output is repeated.
But logic doesn't stop the cryptographer, because she realizes that this proof doesn't really work in practice. You just can't hash all of the numbers between 0 and 2128; there are too many of them -- over 340 trillion trillion trillion. So, a cryptographic hash function just has to be good enough. It has to be hard to find two messages, M and M', such that h(M) = h(M'). If it is, then we can pretend that the above equation is true, even though it's impossible.
Now, one hash function in popular use, MD5, is not good enough. It hasn't been since the mid-1990s. It is still being used in a whole lot of places, though. For example, if you tick the "Remember Password" box in AOL Instant Messenger, it doesn't remember your exact password; it remembers the MD5 hash of your password, encoded in Base64. That's right: AIM does not require your exact password in order to authenticate you. (There is almost1 no security benefit to this use of hash functions, by the way. Even if it were a better hash function, anybody who wants AIM access to your account can just copy the registry value from one computer to another. It makes no difference that a rainbow-tables site2 like freerainbowtables.com will also brute-force the MD5 for you, to tell you what the original password was.)
But that's not the problem with MD5.
The problem with MD5 is that it's no longer very collision-resistant: a dedicated attacker can create two messages which both MD5-hash to the same number.
This, in turn, breaks digital signatures. You see, digital signature schemes usually work by encrypting a hash value. You hash the file/message that you want to sign, then you encrypt the hash. Anybody who wants to verify the signature decrypts the hash3, then compares it to the actual hash of the message. If both are the same number, then the signature is verified.
Suppose, then, that I get you to sign a benign-looking message -- "you" in this case meaning "VeriSign," or someone else who sells SSL certificates. If your hash function isn't collision-resistant, then I can just paste your encrypted signature onto any other message which has the same MD5 hash.
In this case, they exploited something damn sneaky in SSL: they found a different message, with the same MD5 hash, which was also an SSL certificate (!) and which could claim the permission to issue its own certificates4 (!!). Since it has a valid signature from a real included-in-standard-browsers certificate authority, your browser trusts the fake certificate completely. What's worse, it also trusts any other certificates which the attacker decides to sign.
And for practical purposes, there's nothing you can do right now. Except for being glad that the hackers who broke it were professionals at a Dutch university. This isn't a zero-day exploit. (Hell, it's almost a 10-year-old exploit, in principle. I mean, I give all deference to the research that made this newer attack possible; but people have been saying "No MD5!" since I was in high school.)
Those same certificate authorities will now phase out MD5, or else they'll get booted out of web browsers. The MD5 certificates that exist will expire eventually, if they aren't revoked manually. And hopefully, decent browsers will start to pop up little alert saying, "hey, this site's signature could have been forged." Time is the only cure that you can rely on here.
1 I say that there's "almost" no security benefit to the AIM use of hash functions. The word "almost" here means that I can think of one counterexample: a person with a long, secure password, who doesn't mind someone else impersonating her on AIM, so long as they don't access her account settings and reassign the account to someone else.
2 Rainbow tables aren't just a vulnerability in MD5; they're a vulnerability for any fast hash function. A rainbow table is just a clever way to compress a lookup table. It goes like this. You create a fast function g which turns a number into a small password. Then the fast function f(m) = g( h( m ) ) will turn a small password m into another small password f(m). But this lets you construct huge chains: m, f(m), f(f(m)), f(f(f(m))), f4(m), and so on: that chain can then "stand in" for, say, a thousand different passwords. So instead of storing [m, h(m)] in your table, you store [m, h( f1000(m))]. To look up a hash H in this table, first try to look up H -- you'll succeed if one of the f1000s was your password. If that doesn't work, then try to look up h(g( H )): this will succeed if one of the f999s was your password. And so on. If you loop a thousand times and don't find anything, then your hash value wasn't on the original list.
3 If you're security-minded, you might wonder, "how do they decrypt it if they don't have my key? And if they do have my key, that means I'm giving it away to an awfully large number of people -- how do I stop them from encrypting their own hash numbers, thus forging my signature?" The answer to these questions lie in asymmetric, aka public-key, encryption, but this blog post is already too long without that.
4 A major flaw here -- as well as in the rest of SSL -- is that the certificate tells you how much you can trust it. In this case, a field called "basic constraints" says whether this certificate can be used in the middle of certificate chains, rather than just at the end of one. The team at Eindhoven cleverly switched "basicConstraints: CA=false" to "basicConstraints: CA=true" and whoosh, all of SSL dies with it. If that whole "here's how much you can trust me" speech sounds backwards to you, join the party. SSL certificates contain a lot of information like that. For example, SSL certificates can be revoked, making them useless. But you don't know whether they're revoked unless you visit a revocation list. Guess where the URL of the revocation list is kept? That's right, in the certificate. If the attacker politely forgets to include a URL in this section, well, whoops -- revocation is no longer possible.
- - -
Your moment of Zen? The people who are now switching away from MD5 are switching to another function named SHA-1. SHA-1 is much stronger than MD5, but it also has some serious flaws, and a couple of years of time will probably break that, too. They could solve this by using a newer, better generation called SHA-2. Every even-vaguely-modern browser supports it, as far as I know. And, in fact, right now there's a contest for developing the next generation after that, called SHA-3. (I'm rooting for Skein in that competition. Tweakable block ciphers are very pretty.)
I find these posts fascinating, informative, and well-written, and I'm glad you're making them. :)
Heh. I find that I keep editing them for hours after I've written them.
I'm always asking myself, "could my mother understand this?" -- and I'm not satisfied until I'm convinced that she might get the general gist of it.
Well, not knowing your mother I can't speak to her usefulness as a yardstick of the lay-person's comprehension abilities, but I have negligible background in cryptography* and I understood this pretty darn well.
*"Negligible" here translating to "I've had a few interesting conversations with people who studied it in passing as a hobby".
I do the same thing with mine. It's almost obsessive.
I've even got several stashed away in private entries as "works in progress." :/
It helps to keep in mind Jobs' dictum: Real artists ship.
Every second line inspired half-hour long wikipedia tours. Fascination laced with agony.
Still going through the link. :/