Blame Canada for more n-gram applications


The Canadians are as psyched as we are about character n-grams and have applied them to a host of new problems: (1) Alzheimer’s type classification from transcripts, (2) signature-based virus detection from executables, (3) author gender attribution, (4) document clustering, (5) Spam Filtering, and even (6) genome sequence clustering and classification.

Check it out from Vlado Keselj’s List of Publications. Vlado, who’s now at Dalhousie after a Ph.D. at Waterloo, seems to have taken the torch from Fuchun Peng, who recently graduated from Waterloo and moved to UMass. Fuchun’s dissertation is well worth reading for the wide range of character n-gram classification evaluations.

Anyone game to recreate any this work in LingPipe?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: