The confusion arises bc when I read “generative” models I think of the models Nature uses to generate the data we observe, otherwise know as structural models. Hence I thought you were imposing costraints on Nature.

]]>I can’t speak for what Nature can and cannot do, but if the multivariate It creates is correlated for the predictors X1[n] and X2[n], then I as a lowly observer of data, will get correlation in my estimates for b1 and b2.

If I knew what the multivariate normal was that they were drawn from, I can decompose it into two different, independent variables Y1[n] and Y2[n] plus the translation, rotation and scaling matrix. So if you can model the generative process of X1 and X2, and then you’d have the whole story. But there’d still be correlation in your posterior for all of these parameters (or a non-diagonal Fischer information matrix if you’re a frequentist).

]]>Is Nature barred from doing that?

]]>Also that two regressors covary is neither necessary nor sufficient for the parameters attached to them to also covary. It might be an empirical regularity, but it need not.

But we agree on this: “that statistical inference itself isn’t specific to causal inference and doesn’t intrinsically say anything about causation.”

]]>With a Bayesian posterior, you do model the posterior covariance. But the “why” is a different question.

You are never going to ensure independence of parameters in models. Even in a simple linear regression, , you find correlation of the slope and intercept . In a more realistic case, if I use income and education as predictors, you’ll find the coefficients correlated because the predictors are correlated. You could decorrelate the predictors for any given sample using SVD, but it’s a lousy assumption that the resulting transform will decorrelate the entire population.

]]>I’m not sure the above framework allows for such questions. Sure, one can add another hierarchy of parameters, but at some point the buck stops. And at that point Bayes seems to lack an answer.

Some may argue the question is not pertinent. I would argue that the point of a (causal) model is precisely to add enough explanatory variables to ensure the independence of parameters.

Or, to put it another way, to seek a factual explanation for the implicit heterogeneity.

]]>There are techniques for approximate Bayesian inference such as using a point estimate based on MAP (equivalent to regularized or penalized MLE) or based on the posterior mean (or its L1 equivalent which minimizes a different point estimate loss function), or you can use variational inference or expectation propagation or a Laplace approximation to approximate the whole posterior.

But none of this changes a “generative” model to a “discriminative” one.

]]>