Etaoin Shrdlu

I had an interesting idea about using statistical mapping for gibberish generation. There are some fun projects out there, like the Hacker Typer, that let you hit random letters and code comes out, but I think there's some unused information there that's just begging to be put to use. Specifically, the letters you press have a pattern, and you could use that to form patterns in the output.

So here's the idea: take the letter bigrams, the conditional probabilites of a given letter following another, and turn them into word bigrams, the conditional probabilities of a given word following another. So if you type "T", the most common initial letter, that maps to "The", the most common initial word. If you then type "h", the most common letter following "t", it maps to "same", the most common word following "the", and so on.

If you input "the", it would generate "The same time". If you input "this", it would generate "The same way that", because [h,i] and [same,way] are both 3rd most common bigrams in their respective fields. In each case you're just using the ranks of the bigrams to map one to the other, but it means the generated text would be as probable as your input text, so you would get more gibberishy gibberish if you hit random keys than if you type feasible sentences.

Of course, you could also feed the output text back into the input to get arbitrarily large amounts of statistically similar garbage. Might be useful for padding out essays!