Wednesday, February 12, 2014

Bayesian Model Selection - A Worked Example

Choosing between non-nested models can be challenging. A lot of statisticians and econometricians find that a Bayesian approach has a lot to offer when it comes to addressing this challenge. I'm certainly of that view myself.

Let me take you through an empirical example of Bayesian model selection - it involves alternative regression models for the demand for beer in Australia - and it's one that I use, sometimes, in class.
By way of background (prior) information, I ask the students to read two papers - Saffer and Chaloupker (1998) and Zhao and Harris (2003) - before the lab. class. The idea is for them to form their own individual priors about the price elasticity of demand for beer, before they examine the data that we're going to use. 

The sample comprises only 9 annual observations. This might seem to be a small sample, but to a Bayesian that's not a big problem.

(On the other hand, if the sample size is very large, Bayesian methods generally converge to the corresponding likelihood-based methods.)

The demand equations that are estimated are of the form:

                   log(QB) = α + βlog(PB / PX) + ε      ;   ε ~ N[0 , σI] ,

where QB and PB are the quantity and price of beer, and PX is the price of a substitute (complementary?) good - like wine, spirits, or marijuana. So, there are non-nested competing models, defined by the choice of the substitute/complementary good. (Strictly speaking, a log-log demand model is not good microeconomics, but it will help us here with the specification of the priors for the parameters. And, yes, I know it would be nice to have an income effect in the  model.)

You can find the details of the formulae for the various components of our Bayesian analysis, including the Bayesian Posterior Odds (BPO) in this handout. I'm assuming normally distributed errors for the regression models, and I'm placing a natural-conjugate prior on the parameters of each model. This is rather restrictive, but it does mean that we get analytically tractable results.

Now we're ready to get down to business. The data I'm using are on the data page for this blog. You can easily program up the calculations in any language of your choice. An EViews program file and workfile are on this blog's code page. Here's the program file for computing the BPO for any pair of the three models - there's really not much to it , and there are some comments, in red, embedded in the code:
smpl 1990 1998
scalar n = 9
scalar prior_odds = 1     ' Assign the prior odds in favour of Model 1 over Model 2
scalar w1 = 2.001    'In this, & the next 7, lines we provide the parameters for the prior
scalar w2 = 2.001    ' w1 and w2 must exceed 2.0 to ensure the existence of the 2nd moment
scalar c1 = 0.0122
scalar c2 = 0.0122
scalar a1 = 0.053
scalar a2 = 0.053
scalar bbar1 = -1.0        ' The mean of our prior for each price elasticity is -1
scalar bbar2 = -1.0
series y=log(qb)
series x1=log(pb / ps)   'Here we have just 2 models one with X=spirits & one with X=wine
series x2=log(pb / pw)    'Change "pw" to "pm" to get the BPO when X=spirits and X=marijuana
series y=y-@mean(y)       'Take deviations about the sample mean to eliminate the intercept 
for !i=1 to 2
  series x!i=x!i-@mean(x!i)           'Take deviations about the sample means for the X data
  equation eq! y x!i
  scalar bols!i=c(1)                   'Compute the OLS (ML) estimators
  series resid2!i=(y-c(1)*x!i)^2
  scalar ssq!i=@sum(resid2!i) / (n-1)
  series xsq!i=x!i*x!i
  scalar xx!i=@sum(xsq!i)
  series xy!i=x!i*y
  scalar x!iy=@sum(xy!i)
  series ysq=y*y
  scalar yy=@sum(ysq)
  scalar bayes!i=(a!i*bbar!i+x!iy) / (a!i+xx!i)     'Compute the natural conjugate Bayes estimators
  scalar  npcsq!i=w!i*c!i^2+yy+(bbar!i^2*a!i)-bayes!i^2*(a!i+xx!i)
  scalar postvar!i=((npcsq!i^2)/(n+w!i-2))/(a!i+xx!i)
  scalar qa!i=a!i*(bbar!i-bayes!i)^2
  scalar qb!i=xx!i*(bols!i-bayes!i)^2
  scalar delta!i=((n-1)*ssq!i+qa!i+qb!i) / n
  scalar cdel!i=(c!i^2/delta!i)
scalar term1=@sqrt((a1/a2)*((a2+xx2) / (a1+xx1)))
scalar term2=(delta1/delta2)^(-n/2)
scalar term3= (cdel1/cdel2)
scalar term4=@dfdist(cdel1,w1,n) / @dfdist(cdel2,w2,n)
scalar bayesfactor=TERM1*TERM2*TERM3*TERM4
scalar bpo=prior_odds*bayesfactor     ' BPO in favour of 1st model over 2nd model

If we have any symmetric loss function in mind across the model space, we'd favour Model 1 over Model 2 if BPO > 1.

So, let's take a look at some results, based on the following two competing non-nested models:

M1:        log(QB) = α1 + β1log(PB / PS) + ε1       ;   ε1 ~ N[0 , σ1I]
M2:        log(QB) = α2 + β2log(PB / PM) + ε2      ;   ε2 ~ N[0 , σ2I]

The prior odds for the two models are set to 1.0, and the means of the priors for β1 and β2 are each -1. The ML (OLS) point estimates of these elasticities are -1.164 and -0.403 respectively, as we see in the following results:

(The intercepts don't appear in the results above, as the data have been transformed into deviations about sample means - see the program.)

I imagine that most of you would prefer the second of these estimated models, if you really had to choose one of them. I'm looking at the significance of the regressors, and the AIC and SC values.

The Bayes point estimates of the price elasticities in models 1 and 2 are -1.003 and -0.536 respectively. Both lie between the corresponding MLE and prior mean values, as expected in this univariate model.

The Bayes factor turns out to be 0.077, and as the prior odds for the models are set to 1.0, the BPO value is also 0.077. The Bayesian analysis strongly favours Model 2, if you have a symmetric loss function. You'd need prior odds of almost 13 to 1 in favour of Model 1 before the BPO would favour this model over Model 2.

Of course, the results are also sensitive to the choice of the priors on the parameters of the models. Putting the prior odds on the models back to 1.0, but changing the means of the priors on the elasticities to -1.5 in each case, yields Bayes estimators of these parameters of -1.493 and -0.647 respectively. In this case the Bayes factor (and BPO) is 36.76, and so Model 1 would be favoured over Model 2, under a symmetric loss function over the model space.

If you want to play around with the program, you can compute the BPO when the third model (where the regressor is price of beer relative to the price of wine).

So, here's one unsurprising, but important, message coming out of this analysis. Even though the data may strongly favour one model specification over another, model rankings can easily be altered once any available prior information is taken into account.

That's part of what being a Bayesian is all about.


Saffer, H. and M. Chaloupker, 1998. Demographic differentials in the demand for alcohol and illicit drugs. Working Paper 6432, NBER, Cambridge MA.

Zellner, A., 1971. An Introduction to Bayesian Inference in Econometrics. Wiley, New York.

Zhao, X. and M. N. Harris, 2004. Demand for marijuana, alcohol and tobacco: Participation, levels of consumption and cross-equation correlations. Economic Record, 80, 394-410. (Working paper version.)

© 2014, David E. Giles


  1. Thank you Professor. Could you please explain what are a1 and a2 when you set the prior values? They are not defined in the handout.

    1. A1 and A2 are for the conditional prior variances for beta1 and beta2. It's the usual natural-conjugate prior. The conditional prior variance for beta1 is (sigma1^2)(A1^-1) ; etc.