||[Jun. 21st, 2009|09:12 pm]
I only really aced one out of the four big questions on my molecular electronics exam. Oddly enough, it also had a serious typo in the Hamiltonian, and didn't really define its own language. It took me 10-20 minutes to decipher what was going on, and then the TA wanted 10-20 minutes to verify that my version was indeed correct. Here's the problem:|
Consider the motion of an electron on a linear chain of hopping sites separated by a distance a. The motion is described by the following tight-binding Hamiltonian: If you read this question the obvious way -- by deleting the + symbol that doesn't mean anything, you get:
H = -t Σn=-∞∞ ( | n > < n+1+ | n+1 > | < n | )a) What is the meaning of t? What is the unit of t?
b) Show that plane waves are eigenfunctions of this Hamiltonian.
c) Find the expression for the dispersion relation for the electrons and plot it. How large is the band width?
d) Show that at low energy the dispersion relation is that of a free electron with an effective ass. Find the expression of the effective mass as a function of the band width.
H = Σn | n > < n+1 | n+1 > < n | BUT -- and this is a big but -- since wavefunctions are always normalized, < a | a > is just trivially 1. That makes this the sum of all | n > < n | states, but if they're exhaustive and orthogonal, then that's just the identity matrix. So the Hamiltonian is trivially "H = -t." No good.
The key here is that there's an extra | symbol which is in between a ket and a bra on the rightmost part -- and it doesn't mean anything there. It was misplaced, and the interaction is supposed to read:
| n > < n+1 | + | n+1 > < n | Which is indeed a nearest-neighbors coupling.
Anyway. Cute problem.
|Why Random Oracles don't work
||[May. 17th, 2009|12:35 am]
If you read the specs for hash functions and block ciphers, you see a lot of discussion about whether something is "distinguishable from a random oracle" or not. In fact, "distinguishers" themselves are a big part of security.|
So, for example, hash functions. I've written about them before; the idea is to take a file, hash it, and you get a unique number which can substitute for that file. If you give it the same file twice (even if the filename is different) you get the same number; if you give it two different files, you get two different numbers. You might think that probably the best thing is for these numbers to be otherwise indistinguishable from a truly-random number generator.
Indistinguishability is a big thing, here, so it deserves a little focus. We say that something is "secure in the random oracle model" if it's secure when you replace the underlying function with a truly-random number generator. We say that something is "indistinguishable from a random oracle" if there's no way (except with negligible probability of success) to tell the difference between a random oracle and the output of the function. The term "random oracle" just means a function which, whenever it receives a new input, generates truly-random numbers on the fly as its output/response. (The word "oracle" here comes from the terminology of security proofs, where you might sometimes offer to an attacker free knowledge which they otherwise might not necessarily have -- they ask an oracle which magically answers their question.)
The first problem is very obvious: how on earth do you *prove* that there's no way to distinguish your algorithm from this random oracle? Nobody has ever really developed the mathematics to do such a thing.
The security community has a clever answer to this, which is to say, "all of the building-block algorithms should be absolutely public, and all of our cryptanalysts should beat up on them, as much as possible. The longer something goes without someone finding a distinguisher, the more confident we are that it's indistinguishable from a random oracle. And then we prove that, if the base components are random oracles, then the bigger component is likewise a random oracle. So we put all of our confidence in the basic cryptographic building block, which everyone knows; and then we build our systems on top of it.So we don't prove that *anything* is a good substitute for a random oracle; we just prove that some things are secure in the random oracle model.
But the second problem, which was pointed out in 1998, is much more subtle and philosophical. It's this: no function is ever a blanket good substitute for a random oracle.
Here's the example from the original paper. It's a bit bizarre, but it's secure in the random oracle model. It just works like this: some person has a random oracle R(), and they have some extra function (e.g. a hash function) f, and both are tuned to the same output length. They accept a message M from the outside world, and apply some function f. Whenever f(M) = R(M), they reveal all their important secrets. The exact choice of the function f is a variable -- there are *lots* of these situations depending on what you use for f.
Why would you design something like this? Well, you wouldn't -- there's no point to needlessly revealing all your secrets. BUT, this is secure in the random oracle model: no matter how the outside world chooses M or f(M), the random oracle will almost never choose the same number, and your secrets are safe.
But now suppose you try to implement the random oracle with some special pseudorandom function G. Well, for most of these protocols, the protocol remains secure -- but for at least one of them, where f = G, you're now revealing all of your important secrets for *all* input messages! "Whenever G(M) = R(M) we reveal all our secrets" is perfectly secure in the random oracle model -- but it's perfectly insecure if you try to use G as your R function.
You might try to broaden the definition by using a *key* -- just as block ciphers use a key to encrypt and decrypt their plaintext. So instead of *one* function G, we have billions of functions Gk, one of which gets chosen at random to be your random oracle substitute. We also have a suite of random oracles, but we only really need one: just feed the message (k, M) [the concatenation of k and M] instead of the message M, and you select different inputs depending on different k's. Still, we have something like Rk : can a *family* of functions behave like a *family* of random oracles in all circumstances, even though one function can't always?
Notice that this evades the previous problem, because I have to choose one particular function f as part of the protocol there. Now I might choose one of many functions to fulfil my random number generator, while f is just some fixed function.
Well, we can do the same thing, we just have to be a little more tricky. The first thing we assume is that the adversary knows k -- because it's supposedly not a security parameter; it's just an arbitrary choice of a random number generator.
Well, then define a relation ~, which is kind of like an equals sign, except instead of =, which holds when both sides are the exact same number, k ~ x holds when Rk(k) = x. If R is a random oracle, then no matter what the k is, the left side is a random number and x probably doesn't match, so it will be hard for our adversary to find some (k,x) pair so that k ~ x. On the other hand, if you replace R with a family of functions Gk, and tell the adversary k, then for some protocols -- the ones where you have something analogous to the "f = G" idea above -- you just reveal all of your secrets.
Here's one example they give. Imagine that we have a perfectly secure cryptosystem which encrypts with E(key, plaintext) and decrypts with D(key,ciphertext), and assume that it's completely and totally secure. We're going to build a system on top of this, with a random oracle R that implements ~. The idea is that we have a secret key K, an oracle key k, and then take a message M; we truncate the "first part" of M as m -- however long our oracle output is. If k ~ m, then we output the message, (1, M); otherwise we output the message (2, E(K,M)).
This allows for what's sometimes called a "watermarking attack," leaving a watermark on disk that proves that you tried to store a particular message. See, if you're using Gk to implement R, then I just give you the message (Gk(k), watermark) and you've got a watermark on your disk. If you had been using a real random oracle I wouldn't have been able to guess its output ahead of time (it's random!) and thus wouldn't have been able to construct this message except by chance.
Let's make it even worse: when you decrypt (3, M) -- which we didn't define above in encryption -- let's say that we check if k ~ M and then output K -- the secure cryptosystem's private key. Again, if you've got a random oracle, I can't touch you; but if you're using the family of G-functions, my chosen ciphertext attack reveals your private secret K! The system is perfectly secure until you use an algorithm to implement it.
These are all contrived examples, but they show that no function is a good substitute for a random oracle in *all* circumstances -- there will always be systems which are secure in the random oracle model but aren't secure in the real world with a given pseudorandom function.
Despite these contrivances, indistinguishability is still a top concern. Why? Because we don't actually have much else to work with. That's just the nature of the beast.
|drostie.org mostly out of limbo
||[May. 12th, 2009|03:46 pm]
I had underestimated the previous months' schoolwork, which really required a bunch of effort. But the end of April and early May, I pushed very hard, and basically the entire Django backend of drostie.org has been written. This included one day where I just sat down and coded my own admin interface, because I didn't like the one that came with Django and I could tell it was only a ~ 1 day project. The admin interface looks like this:|
After Apache was too much of a resource hog for my taste, I elected to start running Django out of Lighttpd. Surprisingly, I found Lighty to be more versatile than Apache -- though the documentation is *abominable* and there were several features missing. (For example, client certificate support is presently restricted to LDAP, which is quite like saying "You can pick your nose, but only with this shotgun.") But mod_auth can work for my administrative backend just as well as a client cert, so long as Firefox remembers my password and my computer doesn't die spectacularly.
In particular, here's the problem that drove me to lighttpd: I want several subdomains, like code.drostie.org and aca.drostie.org and muse.drostie.org and odd.drostie.org and so forth. They shouldn't share file namespaces (a text file on code shouldn't be visible from odd; the images on odd shouldn't be visible from code), except for some common files that should only be in one place, so that I can update them all at once. That's the business of a rewrite engine, of course, which Apache can only do with mod_rewrite voodoo; and mod_rewrite is a notoriously difficult module for Apache because it implements basically a full-featured scripting language in a medium that was never designed for it. I basically learned Lua overnight in order to program a quick mod_magnet script to do the exact redirection I wanted. I still don't like Lua, but I can live with Lua.
Meanwhile, Apache does *not* like partial host-awareness: you're supposed to either run virtual hosts or to run a single dedicated host. Well, with the rewrite script, you can see that I don't need host-awareness usually -- the file logic is rewritten manually for that. But, here's the problem: some of my pages (the admin ones) use HTTP-based logins to save some coding effort. This *has* to be host-aware, since the server has no other direct way to distinguish these pages from any other page served by Django. (I could have also hardcoded the authentication credentials into Django, but I originally wanted to use an SSL client certificate to do this process. I'd still like to switch over, whenever lighttpd actually implements client certificate support.)
Now, these subdomains must also direct their django calls to some central django applet, which I decided from the start would be host-aware and would implement certain "Projects" (code, odd, aca). This is *supposed* to be done with Apache's mod_python or mod_wsgi, but this also tends to break with virtual hosting. (When I tried to have WSGI kibbutz on another server's WSGI process, it got angry at me.) The native separation in Lighttpd between the web server and the FastCGI Django server made complete sense to me, in this respect. And I had no real issue setting up the pidfiles or the Unix socket to make it all work nice and smooth.
It's all coming together. "Push, push, push," I keep telling myself. "Do something -- anything -- even if it's typing a meaningless line into the source code. Get your brain moving on the project and progress will just happen."
And it is.
||[Mar. 22nd, 2009|10:06 am]
The saga of emacs-vs-vim is legendary, and hearing the passionate rhetoric on both sides, you get the impression that you should at least try both out, and see what the fuss is about. Of course, everybody will admit that neither one is for "just trying": vim takes, by various estimates by vim users, weeks-to-months-to-years to learn to use properly; apparently Emacs requires learning a Lisp dialect and building up configuration files. The goal of vim is that you go to any *nix terminal you want and suddenly can edit source code at a high level. The goal of Emacs is the exact opposite: that the text editor should come to fit you like a well-worn glove, to the point where whatever keys are most intuitive to you are bound to whatever actions you use the most.|
Typing "vim ycombinator.text" creates a blank screen with color-coded tildes, which I can immediately understand are "this line is not actually in the file" markers. Good stuff. As I begin to just write out the first word, "function", I quickly notice that absolutely nothing is being written to screen, and the screen spits back some kind of error at me: "E35: No previous regular expression." Wha...?
Okay, so I have to google a tutorial to find out that, in fact, vim doesn't support editing by default. Instead, it dumps you into a command mode, and editing is accomplished by typing the "i" or "a" keys, to get into "insert" mode.
Insert mode is like a very crappy text editor, because it lacks all of the basic amenities you like (which have been moved to the command mode). For example, if you press the left arrow key at a new line, you might have expected to jump to the end of the last line. That doesn't happen. Also, it doesn't auto-indent your text. Word-wrap is enabled, but it's strictly speaking a character wrap: if your word is too long, it gets divided in half across the lines. And the up and down keys don't move within the virtual lines of a word-wrapped line, but instead move to the next physical line up and the next physical line down.
The command mode includes undo commands for anything that happens within command mode. These are vital if you want to waste your time with command mode, which is apparently necessary to be a real Power User of vim. So let's experiment with those. Poring over several web pages, it's a while before I find anything useful. Here's one: I prefer my tabs to be four spaces long. Since vim apparently doesn't grok this by default (i.e. notice that the last line I wrote was tab-indented four spaces, and thus make the next one the same), the command string goes something like this:
0i Esc,j0; ..j0...j0..j0. or so. Since most of these characters are typed with the right hand and they're in wildly different places, this is really inconvenient, but at least you get to see some sort of benefit from the multiple-periods.
Okay, but that doesn't need to be done in command mode. There's a third mode intended for problems just like this, called "Visual Mode" (really meaning "Selection Mode"). You type Ctrl-v, you select a box with your cursor, and then you type
I Esc to indent multiple lines by four spaces. And then you have to do it again for the interior code. And then again for the code inside that.
The rest of command mode strikes me as deliberately inconvenient. Yes, there is a backspace function, no, it isn't accessed by hitting the backspace key (bwuh?!), but rather by hitting Shift-X. Without the shift, the "x" is secretly the "Del" key. Some things are exceptions to this rule: Entire line edits can often be done by repeating a command twice, so ">>" indents a line, "dd" deletes a line, "yy" copies a line onto a clipboard. (Not *the* clipboard -- vim is completely oblivious to your system clipboard. It has its own internal clipboard, though, and you can copy things to it with "y".) The find key ("f", thank God) only works on the current line with a one-character input, and *cannot* be repeated by period -- to make it search backwards, you need to use F instead, to make it search for anything more than one character, you need the / key, which can be repeated (i.e., find the "next" result) with "n", even though the "f" key cannot be repeated with n. Incidentally, this is why I got the previous error, "E35: No previous regular expression."
To summarize: vim's actual text-editing component, insert mode, is a crippled and horrible mess, lacking the most basic of text-editing and code-editing idioms. To rectify this problem, vim includes a separate component, command mode, by default. Everything that you would expect to have a non-text-editor mode (e.g. find/replace, saving to a new filename) is jammed uncomfortably into this one modality, along with many things you would expect to have in text-editor mode (e.g. copying the current line, undo/redo, quit the editor, et cetera). One particular idiom which you'd expect to find in text-editor-mode, namely text selection, is in fact given a third mode completely different from the other two. And, as a bonus, all of the things you might like to do in command mode are based on a single-character language which you'll have to become fluent in before command mode becomes at all useful to you. And if you are generating large amounts of new code, command mode is not the idiom du jour -- command mode mostly suffices to help you search around in a document that's already been created and copy text from point A to point B.
Unless I've missed something.
I don't see why vim devotees claim that vim makes their coding much faster; I haven't seen anything in vim to suggest even a possible ~10% coding speedup waiting around the corner. Rather, since it lacks some basic amenities, it looks like vim would slow me down by ~10-20% even when I learn it fully and can use it at top-speed. It has brace highlighting. That's about it.
I've tried to also play around with emacs, which also isn't pretty. Possibly there's a follow-up post to this one coming?
|pwext, pwext, pwext...
||[Mar. 2nd, 2009|10:46 pm]
Yesterday I gave a grocer a 20 for 8 euro worth of goods, and he gave me 42 euro in change. I pointed out the error to him. Today, he wanted to give me something extra for being a good person. I declined, because, to be honest, I had already received the reward. I like impressing people, much more than an extra bottle of soda or a free box of Ben and Jerry's. If he's impressed enough to say, "Listen, you gotta let me pay you back," then that's enough for me. I don't know whether that's enough for him.|
Five commitments: Love, Honesty, Charity, Bravado, Humility. I never forget.
I promised myself that my password-derivation-applet, pwext, would be done by the end of January. It's really a fantastic and beautiful bit of code in its present form, I feel. It was, indeed, mostly done at the end of January. I still haven't created the "bookmarklet factory" that would generate a custom pwext bookmarklet based on user input, though.
However, at the end of January, it was clear to me that drostie.org needed a content management system of some form, since I've got dozens upon dozens of little files that I want to go up there. So I decided to work on that. My tentative deadline on that was "the end of February."
This deadline has passed and the framework has only moved partly forward; it's still not deployment-ready. So I've failed to make that deadline. I spent so much time bringing my laptop up to speed that it seems like the beginning of the month was spent on choosing an overall approach (I chose Python + Django, because I'd like to learn more Python and Django had several features I wanted) -- and only at the end of the month did I actually get a database model put together (and I still don't know to what extent I can add or subtract from it later). A large amount of the style and layout of drostie.org was was written about a year ago and is more or less still the system that I want today. The middle of the month feels like something's gone missing.
I want pwext out -- the complete package finished, with Django running on the server, and the bookmarklet-maker ready and working -- by the end of March. However, my laptop has just started acting up *again* (kubuntu no longer recognizes that it has a wireless connection). We'll see how well I do. I'm not exactly enthusiastic, since I've just missed a deadline and now I'm speculating on a new one. But I'm working on it.
My cash supply dwindles ever-lower. I want to get a CMS deployed so that I can show a potential employer, "*THIS* is what I do. Beautiful code, solving the real problems that I see around me, etc." But I also want people to use my inventions. That seems to require marketing, etc. -- I don't know.
Real artists ship, and I'm tired of not shipping.
|Food and Linux.
||[Feb. 22nd, 2009|08:51 pm]
"We're opening our oldest wine bottles," she said, "and I want something for the occasion. I've bought beef, and we have potatoes, and I'd like something French for the occasion, if you please."|
So I did. Or, at least, I tried.
My first demi-glace went perfectly, and formed a mushroom-based sauce for the steaks -- which were seared and then oven-roasted, since my benefactors have no grill. Almonds were crushed, roasted, and tossed in with green beans in a spectacular-if-minimalist vegetable dish.
The gratin that I tried to produce didn't come out so good. The flavors were *perfect*: just the exact amount of rosemary, just the exact amount of garlic, no hiding the beautiful potato flavor -- but for some reason the slagroom did not suddenly turn into a beautiful creamy inside. Instead of a perfectly-seasoned potato gratin, I got perfectly seasoned potato slices, in milk. Willemien told me that she thought she knew why: she usually buys very fast-cooking potatoes, and they're probably also very low-starch, so that there was no residual starch filling the cream and making it even creamier. I'm not so sure -- the slagroom seemed a bit watery to start with, in my opinion. But maybe you're supposed to add an egg or two, or something to that effect.
Dessert: creme fraiche, with some blackberries, my Oma's homemade jam, and ground chocolate shavings. Absolutely delightful in presentation and flavor.
My new laptop now boots into a completely-encrypted Kubuntu Linux install, as well as its Win XP install. The Kubuntu Jaunty alpha is treating me wonderfully; it's everything that I was hoping out of Vista before I had to downgrade and go back to Win XP. The only thing I wish, is that the Toshiba Portege M750 could boot out of its memory card reader. It would be a little more convenient for my intended setup. But it can boot from USB, so I just have the GRUB MBR code in the boot records for all of my USB drives. This loads a local /boot partition on my hard drive, which decrypts the root partition if I give it the right password.
||[Feb. 11th, 2009|08:05 pm]
Several of the algorithms in the SHA-3 competition are 512 bits or more.|
That doesn't sound like much, does it? But it's a tremendous amount, really.
The smallest unit of time that a modern physicist is concerned with is called the Planck time -- 5.3124×10-44 seconds. This is a time scale which, if you zoomed in close enough, something wildly different from anything familiar must be happening. (We know that something weird must be happening because we have two completely different theories -- general relativity, which we apply at the scale of galaxies, and the standard model, which we apply at the scale of the atomic nucleus. They're totally different, and as far as we can tell, completely incompatible with each other; but we can use both of them, because they don't overlap much in their respective domains. They don't overlap anywhere in the middle, either: at the scale of you or me, we neither have to calculate spacetime curvature, nor solve the Schödinger equation. But they do overlap at this miniscule time scale -- and at a similarly small length scale.)
This number, we might express as 2-143.75547 seconds or so. If you were going to count once every Planck second, you would only need 144 bits to count out one full second.
At the other end of the scale, the universe is 13.7 billion years old, give or take about 0.1-0.2 billion. This, we might express as 258.6 seconds or so. A modern 64-bit computer could count out the universe in seconds rather handily in one register.
Divide the second time by the first, and you get the total number of Planck times that have happened in the life of the universe. By exponent rules, you just add the numbers of bits: 58.6 + 143.8 = 202.4 bits.
An ideal 512-bit hash function is so secure that even if someone had one quadrillion of the fastest computers that a modern physicist can imagine, and worked on finding a single collision for the entire lifetime of the universe, their chance of success would only be about 8%. We would have to make a significant advance in physics before we could conjecture what sorts of sub-Planck computing could be possible. (We're not even completely sure what we can and can't do with quantum computing, and we're relatively solid on the fundamental rules of quantum mechanics.)
|National Condom Day
||[Feb. 10th, 2009|04:01 pm]
The Dutch have a National Condom Day. It is February 14th -- one day after the day when it really mattered. (I suppose good Hollandse sex happens after midnight?)
I have a laptop again! Finally. It's a Toshiba Portege (not a typo!), and it's a tablet PC. You cannot imagine the delight of dragging windows around Minority Report style. However, I must recommend against Toshiba as a brand: too much bloatware, and you must wait for all of it to install on the computer when it first arrives -- the operating system doesn't exactly come preinstalled.
It came with Vista. I tried to give it a fighting chance, I really did. They messed up the Start menu pretty bad, though, and the minor chrome adjustments that I saw were not really worth the extra wait at boot time. Toshiba kindly sent me Windows XP Tablet Edition restore disks instead of Vista restore disks, so I'm now running XP again.
(But I will install Kubuntu Linux on it sometime soon -- once I can wrap my head around what this guy did.)
pwext is almost finished as a bookmarklet. There is a delay because I don't yet have a framework that I can use to publish applications. I'm working on learning Django so that I can create something nice on drostie.org -- possibly with an eventual comment system. Also, I still need to create a setup script that will generate your pwext seed from random keypresses, et cetera. It's a bit of work, but it's going.
With a tablet PC, One Dark Dream (a webcomic/webdrama story that has been kicking around in my head for about a year now) can possibly start up. I'll have to see how long it takes for me to write the first few comics, draw them out, assemble them, and publish. One thing I don't like: in Windows, the cursor has this nasty tendency to lag behind the tablet pen, which might smooth the curves but it might also be annoying. I don't know yet.
|MD5-based SSL: totally, completely broken.
||[Jan. 10th, 2009|11:00 pm]
I think I'm going to keep teaching Crypto-in-plain-English on this blog for a while yet, so let's talk current events.|
To welcome in the new year, a group at the TU Eindhoven in the Netherlands successfully attacked SSL in a rather serious way: they created a certificate authority which every browser trusts, which could make anyone claim to be, for example, HSBC or the Bank of America -- or whoever you want. SSL is theoretically supposed to protect against that, but it has completely failed to do so.
SSL is that technology that, among other things, puts the padlock icon on your browser. Browsers try to present these visual cues to tell you that there is encryption going on "under the hood," so that someone can't just overhear your credit card number. The TUE group's live demo at the hacking conference was an open wireless router that could overhear anything you tried to transmit through it over SSL -- but all of the padlocks and browser notifications were still there.
It's called a man-in-the-middle attack. Suppose Alice wants to securely contact Bob. A malicious attacker named Mallory says "Hi, I'm Bob," and presents valid IDs and credentials to Alice. They begin a secure communication with each other. Then, Mallory impersonates Alice in a similar way when he opens a secure communication line to Bob. By sitting between these two secure communication lines, Mallory gets the entire communication recorded in his logs.
The TUE group falsified the identification of SSL: they devised a way to say "Hi, I'm HSBC" or "Hi, I'm Bank of America" and get completely trusted by the browser. They did it by breaking a hash function called MD5.
( Read more...Collapse )
- - -
Your moment of Zen? The people who are now switching away from MD5 are switching to another function named SHA-1. SHA-1 is much stronger than MD5, but it also has some serious flaws, and a couple of years of time will probably break that, too. They could solve this by using a newer, better generation called SHA-2. Every even-vaguely-modern browser supports it, as far as I know. And, in fact, right now there's a contest for developing the next generation after that, called SHA-3. (I'm rooting for Skein in that competition. Tweakable block ciphers are very pretty.)
||most recent entries