I was shocked and dismayed when I heard from a reader that I’d used the term “model” over 200 times in the LingPipe book without ever saying what it meant. This kind of thing is why it’s so hard to write introductory material.
Perhaps I shouldn’t have been surprised at this comment, because other people had expressed some uncertainty about the term “model” to me in the past.
What is a (Mathematical) Model?
In short, when I say “model”, I mean it in the bog-standard scientific sense, as explained on:
- Wikipedia: Mathematical Model
Quite simply, it’s just a bunch of math used to describe a phenomenon. Nothing interesting here either philosophically or conceptually, just the usual scientific method.
For instance, Newton’s equation (force equals mass times acceleration) is a mathematical model that may be used to describe the motions of the planets, among other things. Newton derived his model from Kepler’s observation that the planets picked out equal area in equal time in their orbits. Newton realized that by introducing the notion of “gravity”, he could model the orbits of the planets. Of course, he had to invent calculus to do so, but that’s another story.
Prediction vs. Postdiction
Typically models are used for predicting future events, but sometimes they’re used to retroactively try to understand past events (“backcasting” [“aftcasting” if you’re nautical] is the time opposite of “forecasting”, and “postdiction” the opposite of “prediction”). For instance, climate scientists attempt to postdict/backcast earth temperatures from data such as tree rings; we’re working on fitting models of such data with Matt Schofield as part of our Bayesian inference project at Columbia.
All Models are Wrong, but …
As the statistician George E. P. Box said, “Essentially, all models are wrong, but some are useful.” For instance, Newton’s model is wrong in that it doesn’t correct for relativistic effects at very high velocities. But it proved useful at predicting everything from eclipses to the tides.
The models we’ve used in LingPipe, such as the HMM model of part-of-speech tagging, are also clearly wrong. Language just isn’t Markovian (meaning that the n-th word only depends on a fixed window of previous few words). But we can still do pretty well at predicting part of speech tags with the simplified model.