I forgot to include the example in yesterday’s post. I’m reconstructing this from memory, so it doesn’t exactly match the one Dave Lewis used in his talk.
Suppose you have a decision to make among different modes of transportation to get you from home to work: car, bus, train, bicycle, walking. Each will come with a cost in dollars and time (and risk, but I’ll leave that to Professor Risk).
It makes sense to tie the parameters for cost and for time across the different modes of transportation. We’ll use dimension 1 of our feature vectors for time (in minutes) and dimension 2 for money (in US dollars). For instance, represents a mode of transportation for a person that takes 15 minutes and costs one dollar.
It’s statistics, so in order to fit a model, we need exchangeable replicates. In this case, we use multiple commuters. For instance, suppose we consider commuter , for whom walking takes 30 minutes, and has no cost, taking the bus takes 20 minutes and costs $2, and driving takes 10 minutes and costs $5. With our simple model of transportation choice, we represent the three outcomes for commuter as
for taking the bus, and
We might then have a second person who lives further from work, with features
for the bus, and
The model parameters are the two coefficients for minutes and dollars, yielding a 2D vector .
Suppose we have fit a model (presumably with a well-defined prior), and want to look at predictions. For person we get probabilities for choice of transportation mode
for busing, and
This all fits very neatly into the max-ent or DCA style of generating features for a multinomial logistic model. It even seems easier to conceptualize in this case than tied parameters.
Of course, if we really wanted to model commuter choices, we’d use a much more complex model including features like (disposable) income, neighborhood, etc., plus some non-linear combinations of existing features. All of these need not be shared across choices like time and money.