Wednesday, March 19, 2014

MCMC for Econometrics Students - III

As its title suggests, this post is the third in a sequence of posts designed to introduce econometrics students to the use of Markov Chain Monte Carlo (MCMC, or MC2) methods for Bayesian inference. The first two posts can be found here and here, and I'll assume that you've read both of them already.

We're going to look at another example involving the use of the Gibbs sampler. Specifically, we're going to use it  to extract the marginal posterior distributions from the joint posterior distribution, in a simple two-parameter problem. The problem - which we'll come to shortly - is one in which we actually know the answer in advance. That's to say, the marginalizing can be done analytically with some not-too-difficult integration. This means that we have a "bench mark" against which to judge the results generated by the Gibbs sampler.

Let's look at the inference problem we're going to solve.

Suppose that we have a population that can be described by a Normal distribution with an unknown mean, μ, and an unknown precision parameter, τ. The latter parameter is just τ = 1/σ2, where σ is the standard deviation of the population. We want to estimate both μ and τ.

Before we see any sample data, we are a priori completely ignorant about the possible values of the parameters. So, we assign the following "diffuse" joint prior density:

                    p(μ , τ) = p(μ) p(τ) ∝ 1/τ       ;         -∞ < μ < ∞      ;       0 < τ < ∞

Then, we take a sample of n independent values from the population. So, the likelihood function is:

                   L(μ , τ | y) = p(y | μ , τ) ∝ τn/2 exp[-(τ / 2) ∑(yi - μ)2]  ,

where y is the vector of n sample values.

So, by Bayes' Theorem, the joint posterior density for the two parameters is:

                  p(μ , τ | y) = [p(μ , τ) L(μ , τ |y) / p(y)] ∝ p(μ , τ) L(μ , τ | y)

                                  ∝ τn/2 - 1 exp[-(τ / 2) ∑(yi - μ)2]            .                                       (1)

What we're interested in obtaining are the marginal posterior densities for each of the two parameters.

Now recall that the basic inputs for the Gibbs sampler are the conditional posterior distributions for each of the (groups of) parameters. If we look at equation (1) and treat τ as if it's a constant - i.e., if we condition on τ - we can see that

                p(μ | τ , y) ∝ exp[-(τ / 2) ∑(yi - μ)2]

                                 ∝ exp[-(τ / 2) {vs2 + ∑(μ - y*)2}]
                       
                                 ∝ exp[-(nτ / 2) (μ - y*)2]  ,

where y* is the sample arithmetic mean for the y data; v = (n - 1); and s2 is the sample variance.

That is, the conditional posterior for μ is a Normal distribution, with a mean of y* and a variance of (nτ)-1.

Conversely, if we look at equation (1) and treat μ as if it's known - i.e., if we condition on μ - we can see that

               p(τ | μ , y) ∝ τn/2 - 1 exp[-τ {0.5∑(yi - μ)2}]  .

So, the conditional posterior for τ is a Gamma distribution, with a shape parameter,  r = (n / 2) and a scale parameter,  λ = [0.5 ∑(yi - μ)2] .

We now have the information that we need to implement the Gibbs sampler.

Here are the steps we'll follow:
  1. Provide an initial value for τ. Call this τ(0).
  2. Draw a random value from p(μ | τ(0) , y). Call this value μ(1).
  3. Then draw a random value from p(τ | μ(1) , y). Call this value τ(1).
  4. Draw a random value from p(μ | τ(1) , y). Call this value μ(2).
  5. Repeat steps 3 and 4 many, many times.
After a "burn in" period, the values of μ and τ will begin to take values that are consistent with being drawn from the marginal posterior distributions for these parameters, rather than from their conditional posterior distributions. The strings of values for μ and τ associated with the "burn in" will be discarded. 

The thousands of subsequent values of these parameters that we're going to generate will provide us with information about the marginal posterior distributions for μ and τ.

Now, as mentioned above, for this particular inference problem we actually know the forms of the marginal posterior distributions for μ and τ. Specifically, that for for μ is Student-t, and that for τ is Gamma. We'll come back to this point below, when we check the accuracy of out Gibbs sampler results.

Let's take a look at the R script that I've written to implement the Gibbs sampler for this problem. It's available on the code page for this blog. As in the previous post on MCMC, the R code is pretty basic. It's written with a view to transparency so that non-users can see what's going on. It could certainly be streamlined a lot, but that's not my concern right now.

So here we go. First, let's initialize some quantities, and draw sample of n = 10 observations from a Normal population:


Now let's go through the loop for the Gibbs sampler itself:


Now we'll drop the first 5,000 values of μ and τ for the "burn in":


Let's see what our results look like. Do we have marginal posteriors for μ and τ that are Normal and Gamma? We're going to take a look at some skewness and kurtosis measures as part of the basis for assessing our results, so the "moments" package for R has to have been installed, and we'll access is with a "library" command:


Here are the results at this point. First, we have some of the characteristics of what should be the marginal posterior for μ:


We can see that the posterior mean for μ lines up very nicely at 1.075. The skewness is essentially zero; and the kurtosis, which should be 4.2, is simulated to be 4.29. That's pretty good!

Turning to the marginal posterior for tau, we have:


The skewness for Gamma marginal posterior for τ should be 0.894, and we have a value of 0.95. Not bad. In addition, the kurtosis should be 4.2, and we have 4.35.

Also, notice that that the sample data were drawn from a population with μ = τ = 1. If we had a quadratic loss function, our Bayes estimators of these two parameters would be the marginal posterior means: 1.08 and 1.10 respectively. If we had an absolute error loss function, then the Bayes estimators would be the marginal posterior medians: 1.08  and 1.02 respectively.

And we achieved theses results with just n = 10 sample observations.

Finally, let's take a look at the plots for the two marginal posterior densities:



So, the results for the marginal posterior distributions for mu and tau, obtained using the Gibbs sampler, look pretty convincing,

In the next post in this sequence we'll look at a Bayesian inference problem where we can't obtain the results analytically, and the use of the Gibbs sampler is essential to obtaining a solution.


© 2014, David E. Giles

13 comments:

  1. Are you going to do an example using RStan or one of the other Bayesian packages for R? It seems that whenever I see someone explain Bayesian methods, they invariably spend too much time working out algebra to derive the posterior -- and that's just for simple regression.

    It seems to me that you should be able to specify priors and the likelihood function and have the software do the machinations needed to plot the posteriors, obtain summary statistics, etc.

    Bayesian methods are interesting, but unless or until they are as easy to implement in stats packages as frequentist methods, I think they'll remain in the minority.

    ReplyDelete
    Replies
    1. Brian - absolutely. There are some great packages in R for Bayesian inference, and I'll be illustrating a couple of them that are good for econometricians.

      Delete
  2. I'm an MCMC practitioner. I agree with much in this post with one exception. A person unsophisticated in MCMC probably should not be writing her/his own. There are packages with better MCMC methods than vanilla Metropolis. Many people know about STAN. My favorite is Emcee Hammer, which is in Python (interface to R?) and has good reviews in the astrophysics community. Google on "emcee hammer monte carlo".

    ReplyDelete
    Replies
    1. Jonathon - I totally agree with you, I probably should have made it clearer that these pieces are simply designed to give students an idea of the basic principles involved when they are using these tools. While I'm familiar with STAN I hadn't come across Emcee Hammer - I''ll certainly check it out. Thanks for the suggestion.

      Delete
  3. I've long thought that problems in economics are more like problems in ecology than in physics, so that economists should pay more attention to what ecologists do. Bayesian analyses using MCMC are now pretty common in ecology, and there has been a good textbook available since 2007: Models for Ecological Data, by James S Clark.

    ReplyDelete
    Replies
    1. There are also several good texts for econometrics - e.g., Greenberg, Lancaster, etc.

      Delete
  4. I finding your MCMC post series very useful, thank you for sharing!

    ReplyDelete
  5. p(μ | τ , y) ∝ exp[-(τ / 2) ∑{vs^2 + (μ - y*)^2}]. Should the RHS be exp[-(τ / 2) {vs^2 + ∑(μ - y*)^2}]?

    ReplyDelete
    Replies
    1. Thanks - fixed! It didn't affect anything that followed.

      Delete
  6. This comment has been removed by a blog administrator.

    ReplyDelete
  7. Thanks for this intro. I'm trying to learn this by following along--is it just me or is the code for this part not actually on the code page? (I only see it for parts II and IV. I'm just re-typing it myself, but I'm new to R so simple typos screw me up.)

    ReplyDelete
    Replies
    1. The code is now there, and I've also updated the post itself a little - the second chart was previously wrong.

      Delete

Note: Only a member of this blog may post a comment.