Why Random Oracles don't work 
[May. 17th, 200912:35 am]
drostie

If you read the specs for hash functions and block ciphers, you see a lot of discussion about whether something is "distinguishable from a random oracle" or not. In fact, "distinguishers" themselves are a big part of security.
So, for example, hash functions. I've written about them before; the idea is to take a file, hash it, and you get a unique number which can substitute for that file. If you give it the same file twice (even if the filename is different) you get the same number; if you give it two different files, you get two different numbers. You might think that probably the best thing is for these numbers to be otherwise indistinguishable from a trulyrandom number generator.
Indistinguishability is a big thing, here, so it deserves a little focus. We say that something is "secure in the random oracle model" if it's secure when you replace the underlying function with a trulyrandom number generator. We say that something is "indistinguishable from a random oracle" if there's no way (except with negligible probability of success) to tell the difference between a random oracle and the output of the function. The term "random oracle" just means a function which, whenever it receives a new input, generates trulyrandom numbers on the fly as its output/response. (The word "oracle" here comes from the terminology of security proofs, where you might sometimes offer to an attacker free knowledge which they otherwise might not necessarily have  they ask an oracle which magically answers their question.)
The first problem is very obvious: how on earth do you *prove* that there's no way to distinguish your algorithm from this random oracle? Nobody has ever really developed the mathematics to do such a thing.
The security community has a clever answer to this, which is to say, "all of the buildingblock algorithms should be absolutely public, and all of our cryptanalysts should beat up on them, as much as possible. The longer something goes without someone finding a distinguisher, the more confident we are that it's indistinguishable from a random oracle. And then we prove that, if the base components are random oracles, then the bigger component is likewise a random oracle. So we put all of our confidence in the basic cryptographic building block, which everyone knows; and then we build our systems on top of it.So we don't prove that *anything* is a good substitute for a random oracle; we just prove that some things are secure in the random oracle model.
But the second problem, which was pointed out in 1998, is much more subtle and philosophical. It's this: no function is ever a blanket good substitute for a random oracle.
Here's the example from the original paper. It's a bit bizarre, but it's secure in the random oracle model. It just works like this: some person has a random oracle R(), and they have some extra function (e.g. a hash function) f, and both are tuned to the same output length. They accept a message M from the outside world, and apply some function f. Whenever f(M) = R(M), they reveal all their important secrets. The exact choice of the function f is a variable  there are *lots* of these situations depending on what you use for f.
Why would you design something like this? Well, you wouldn't  there's no point to needlessly revealing all your secrets. BUT, this is secure in the random oracle model: no matter how the outside world chooses M or f(M), the random oracle will almost never choose the same number, and your secrets are safe.
But now suppose you try to implement the random oracle with some special pseudorandom function G. Well, for most of these protocols, the protocol remains secure  but for at least one of them, where f = G, you're now revealing all of your important secrets for *all* input messages! "Whenever G(M) = R(M) we reveal all our secrets" is perfectly secure in the random oracle model  but it's perfectly insecure if you try to use G as your R function.
You might try to broaden the definition by using a *key*  just as block ciphers use a key to encrypt and decrypt their plaintext. So instead of *one* function G, we have billions of functions G_{k}, one of which gets chosen at random to be your random oracle substitute. We also have a suite of random oracles, but we only really need one: just feed the message (k, M) [the concatenation of k and M] instead of the message M, and you select different inputs depending on different k's. Still, we have something like R_{k} : can a *family* of functions behave like a *family* of random oracles in all circumstances, even though one function can't always?
Notice that this evades the previous problem, because I have to choose one particular function f as part of the protocol there. Now I might choose one of many functions to fulfil my random number generator, while f is just some fixed function.
Well, we can do the same thing, we just have to be a little more tricky. The first thing we assume is that the adversary knows k  because it's supposedly not a security parameter; it's just an arbitrary choice of a random number generator.
Well, then define a relation ~, which is kind of like an equals sign, except instead of =, which holds when both sides are the exact same number, k ~ x holds when R_{k}(k) = x. If R is a random oracle, then no matter what the k is, the left side is a random number and x probably doesn't match, so it will be hard for our adversary to find some (k,x) pair so that k ~ x. On the other hand, if you replace R with a family of functions G_{k}, and tell the adversary k, then for some protocols  the ones where you have something analogous to the "f = G" idea above  you just reveal all of your secrets.
Here's one example they give. Imagine that we have a perfectly secure cryptosystem which encrypts with E(key, plaintext) and decrypts with D(key,ciphertext), and assume that it's completely and totally secure. We're going to build a system on top of this, with a random oracle R that implements ~. The idea is that we have a secret key K, an oracle key k, and then take a message M; we truncate the "first part" of M as m  however long our oracle output is. If k ~ m, then we output the message, (1, M); otherwise we output the message (2, E(K,M)).
This allows for what's sometimes called a "watermarking attack," leaving a watermark on disk that proves that you tried to store a particular message. See, if you're using G_{k} to implement R, then I just give you the message (G_{k}(k), watermark) and you've got a watermark on your disk. If you had been using a real random oracle I wouldn't have been able to guess its output ahead of time (it's random!) and thus wouldn't have been able to construct this message except by chance.
Let's make it even worse: when you decrypt (3, M)  which we didn't define above in encryption  let's say that we check if k ~ M and then output K  the secure cryptosystem's private key. Again, if you've got a random oracle, I can't touch you; but if you're using the family of Gfunctions, my chosen ciphertext attack reveals your private secret K! The system is perfectly secure until you use an algorithm to implement it.
These are all contrived examples, but they show that no function is a good substitute for a random oracle in *all* circumstances  there will always be systems which are secure in the random oracle model but aren't secure in the real world with a given pseudorandom function.
Despite these contrivances, indistinguishability is still a top concern. Why? Because we don't actually have much else to work with. That's just the nature of the beast. 

