Unfortunately, what counts as “heavy math” is very relative. That’s why I never know what to recommend.

stats.stackexchange and metaoptimize are both reasonable, but the math in Bishop’s book isn’t “heavy” by the standards of these sites.

A *much* easier book to start with is the O’Reilly *NLTK* book (it’s about the Natural Language Toolkit package in Python and is written assuming the reader is a linguist who’s never programmed or done any math after high school).

Something in between the NLTK book and Bishop’s book is the Witten and Frank *Data Mining* book. It covers many of the same algorithms, but is much more practical.

Next up from there would be Manning and Schuetze’s NLP book, but it’s now very dated and still doesn’t go into much of the actual math required.

After that, I’m afraid you need to bite the bullet and learn the linear algebra. Strang’s book is great and I hear there’s also an MIT class online for it. And of course calc if you don’t know that — I don’t have a good reco for that.

]]>Thanks so much for the help, Bob! I’ve gotten a hold of Bishop’s book and it looks good, but despite some experience in linear algebra, I’ve already hit a few math items I have questions on. Besides the great online courses you mentioned, are there any good places to go for interactive help on such issues? I’ve seen a couple (stats.stackexchange.com and metaoptimize.com and openstudy.com), but wondering if you had any other suggestions where I could ask questions related to the heavy math in these books you recommended.

]]>I’m carp@alias-i.com. Google [bob carpenter] if you need to find me.

Other than linear algebra and calc and a bit of algorithms, the ante’s pretty low if you already know basic stats. Then it depends what you want to do. “Machine learning” is understood even more broadly than statistics. Do you want to scale something simple to the web? Build something smaller scale but more involved for sequence data?

I’ve always come at these things through applied problems that I’ve been interested in. The general advice to do it like I do it would be to read through NIPS proceedings until you see something you’d like to understand and then work backwards through all the bits required. Ideally with people around to help you get through the confusing parts.

Or if you’re a more bottom-up person, follow one of the Stanford or MIT online courses. Or read Bishop’s book if you know linear algebra and calc or Nocedal and Wright if you care more about optimization.

]]>I would absolutely love to fully be able to understand topics like SVM and SGD, but I come from a traditional stat/research methods background (psych) where ANCOVA and multiple linear regression is still state of the art.

I hate to bother you, but would you be willing to recommend the best places to get started to get to a place where I can understand these topics? I’d like to REALLY understand them, not just know how to use them (I’ve taken this approach with traditional stat, and it has served me well – allowing me to avoid many mistakes “practicioners” make.)

It seems to require more than just an understanding of linear algebra… :-)

Thanks, for your time,

Scott Edwards

I’m more concerned about (a) — whether they make sense. I just don’t see any way to independently evaluate that. I know people like to plug the results in some other end-to-end system they care about (e.g., use them as features to improve a classifier or tagger).

This is one of the issues Becky Passoneau and I are wrestling with for word-sense annotation. Different people may understand word senses differently and some instances of words may fall between the cracks of any discrete sense inventory. So the task of finding “the” sense of an instance of a word may not make sense. There’ll always be some agreement and some disagreement until users do what’s commonly known as “semantics” and adjudicate the meaning of the words they’re using among themselves.

But we can get good kappa scores and build systems that agree with any annotation scheme pretty well. The statistical classifiers are pretty robust to noise and there’s some core cases on which everyone seems to agree.

]]>Were you to submit a paper to a conference, the absence of a second rater might engage the lizard brain of some reviewer, and lead to rejection. Actually, it could be that some reviewers can do this as a spinal reflex, with no use of the brain whatsoever.

]]>