Wow! These visualizations, which I just saw linked from Slashdot, blew me away:
I particularly like the word associations visualization, which compares pairs of words, such as good/evil and then investigates the words that follow them in bigrams base on conditional probablity bands, then sorts the words in each band by unigram frequency. The word spectrum visualization is also nice. By the use of space and scale, Harrison was able to show much more information than I’ve ever seen in a graph like this. Usually they look like giant hairballs.
The natural language processing part of this exercise is pretty much trivial. It’d be easy to do with the LingPipe language modeling package, for instance.
I’d like to see some part-of-speech type things done this way, but that’d be of more interest to linguistic geeks than the general public. Translation would also be interesting if you knew two languages. The Netflix data or other collaborative filtering data would be fun to visualize, too. As would phrasal data with a binary feature, such as Ryan McDonald et al.’s phrase-sentiment graph.