In a recent post I mentioned my long-standing interest in Bayesian Econometrics. When I teach this material I usually include a simple application that involves estimating a consumption function using U.S. time-series data. I used to have this coded up for the SHAZAM package. EViews isn't appropriate as it doesn't include a numerical integration routine.
You could use BUGS, or some other package, but it's nice to see what is going on, step-by-step, when you're encountering this stuff for the first time.
The other day, I thought, "It's time to code this up in R". So, here we are!
Here's how I deal with the example. We have annual U.S. data on real private consumption expenditure and real disposable income, for the period 1950 to 1985. (See the data page for this blog.) We're going to estimate a really simple consumption function. To reduce the number of parameters for which prior information has to be assigned, we'll measure the data as deviations about sample means, thereby eliminating the intercept. So, the model is:
Ct = βYt + εt ; t = 1, 2,......., 36
where C and Y denote consumption and income.
Let's assume that the errors are independently and normally distributed.
Now, let's put on our Bayesian hat! We know that the m.p.c. (β) must lie between zero and one in value. Otherwise the results make no economic sense. As far as σ is concerned, all I know is that it must be positive and finite in value.
I'm going to put an improper ("diffuse") marginal prior on σ, and a Beta marginal prior on β. The latter reflects the fact that this parameter must lie between zero and one. Assuming that the prior information about σ is independent of that about β, the joint prior for β and σ is:
p(β, σ) = p(β) p(σ)
p(σ) α (1 / σ) ; 0 < σ < ∞
p(β) α β (α - 1)(1 - β)(γ - 1) ; 0 < β < 1 ; α, γ > 0
where "α" denotes "is proportional to".
The parameters, α and γ, of the marginal prior for β will be assigned values to reflect our prior information about the m.p.c., before we see the current sample of data.
Here's how we'll do that. Recall that the mean and variance of a Beta distribution are:
Mean = m = α / (α + γ)
Variance = v = (α γ) / [ (α + γ)2 (α + γ + 1) ] .
If we assign values to the mean and the variance, to reflect our prior beliefs, the two equations above can be solved for the implied values of α and γ. Specifically:
α = (m / v) (m (1 - m) - v)
γ = (1 - m) [m (1 - m) - v] / v .
Having chosen values for m and v, we have now fully specified the joint prior for β and σ.
Given the normality of the errors, the likelihood function takes the form
L(β, σ | Data) = p(Data | β, σ) α (1 / σ)n exp[ -Σ (Ct - βYt)2 ] / (2σ2) .
Here, "Data" refers to the random consumption data. Then, by Bayes' Theorem, the joint posterior p.d.f. for β and σ is:
p(β, σ | Data) α p(β, σ) L(β, σ | Data)
α (1 / σ)(n + 1) β (α - 1)(1 - β)(γ - 1) exp[ -Σ (Ct - βYt)2 ] / (2σ2) .
We can marginalize this joint posterior pd.f. by integrating it with respect to σ, over the range (0 , ∞). This integration can be performed analytically, quite easily. The details can be found here.
The marginal posterior for β is then given by:
p(β | Data) α β (α - 1)(1 - β)(γ - 1) [ Σ (Ct - βYt)2 ] (-n / 2) .
We now need to compute the normalizing constant for this marginal posterior density - the constant that ensures that this p.d.f. integrates to unity. We simply take the last expression above, integrate it with respect to β over the interval [0, 1], and then the reciprocal of the value of this integral is the normalizing constant.
The integration with respect to β is readily done numerically, given the closed interval for the β values.
To illustrate this, using the U.S. consumption data, and setting the prior mean and variance for β to m = 0.75 and v = 0.0005 respectively. The latter values imply choices for the parameters of the prior for β of α = 280.5, and γ = 93.5.
This is what we get when we run the R code available in the code page for this blog:
The nice thing about being a Bayesian is that I don't have to justify to you where I got the values for the prior mean and variance from! (Just kidding, of course :-) .)
Once we have the normalized marginal posterior for β, we can compute the mean and variance of this p.d.f. as:
E[β | Data] = ∫ β p(β | Data) dβ
Var[β | Data] = ∫ (β - E[β | Data] )2 p(β | Data) dβ .
Again, these integrals are easy to compute numerically, given the range for β.
In our example, we get E[β | Data] = 0.8810, and Var[β | Data] = 0.000219.
Under a quadratic loss function, E[β | Data] is the Bayes estimator of β. Other loss functions imply different Bayes estimators of β.
For our example, under a zero-one loss function, the posterior mode is the Bayes estimator. It's simple to check that the mode of p(β | Data) occurs at β = 0.8792. We get essentially the same estimator under either loss function in this particular example.
Now, lets suppose that non-Bayesian (yes, there really are such people!) comes along and decides to use OLS estimation. Given the normality of our errors, that's the same as MLE as far as the estimation of β is concerned. This, in turn, is identical to the Bayes estimator we'd get if we used the following totally "diffuse" prior for β and σ:
p(β, σ) = p(β) p(σ),
where p(β) α constant ; -∞ < β < ∞
and p(σ) α (1 / σ) ; 0 < σ < ∞
We could consider this to be our "reference prior". The associated results will enable us to see how much "influence" our chosen prior information had on the results given above.
Here are the OLS (MLE) regression results from R:
Also notice that while the prior variance for β was 0.005, and the variance of the posterior distribution for β, is 0.00219. Adding the sample information to the prior information leads to a gain in estimation precision, as we'd expect.
A Bayesian posterior interval for β can also be constructed, but I won't pursue that here.
Here are some results you get when you vary the prior information:
Hopefully, this example will be useful to some of you. Put on your Bayesian hat, and have fun!
[Technical Note: The likelihood function that's plotted above is actually the marginal (or integrated) likelihood function. The full likelihood function is the joint data density, viewed as a function of the two parameters, β and σ. So that I can plot the likelihood here, I've marginalized it by integrating out σ. The integration to get the kernel of this function can be done analytically, using a similar approach to that used above to marginalize the posterior with respect to σ. Then the normalizing constant is computed numerically.]
© 2012, David E. Giles