Author Archive

Language Model Generated Injection Attacks: Cool/Disturbing LingPipe Application

March 9, 2010

Joshua Mason emailed us with a link to his (with a bunch of co-authors) recent ACM paper “English Shellcode” ( Shell code attacks can attempt to seize control of a computer by masquerading as data. The standard defense is to look for tell-tale patterns in the data that reflect the syntax of assembly language instructions. It is sort of like spam filtering.The filter would have to reject strings that looked like:


which would not be too hard if you knew to expect language data.

Mason et al changed the code generation process so that lots of variants of the injection are tried but filtered against a language model of English based on the text of Wikipedia and Project Gutenberg.The result is an injection attack that looks like:

“There is a major center of economic activity, such as Star Trek, including The Ed Sullivan Show. The former Soviet Union.”

This is way better than I would have thought possible and it is going to be very difficult to filter. It would be interesting to see how automatic essay grading software would score the above. It is gibberish, but sophisticated sounding gibberish.

And it used LingPipe for the language processing.

I am a firm believer in the white hats publicizing exploits before black hats deploy them surreptitiously. This one could be a real problem however.