## Tuesday, March 18, 2014

### MCMC for Econometrics Students - II

This is the second in a set of posts about Monte Carlo Markov Chain (MCMC, or MC2) methods in Bayesian econometrics. The background was provided in this first post, where the Gibbs sampler was introduced.

The main objective of the present post is to convince you that this MCMC stuff actually works!

To achieve this, what we're going to do is work through a simple example - one for which we actually know the answer in advance. That way, we'll be able to check our results from applying the Gibbs sampler with the facts. Hopefully, we'll then be able to see that this technique works - at least for this example!

I'll be using some R script that I've written to take students through this, and it's available on the code page for this blog. I should  mention in advance that this code is not especially elegant. It's been written, quite deliberately, in a step-by-step manner to make it relatively transparent to non-users of R. Hopefully, the comments that are embedded in the code will also help.

It's also important to note that this first illustration of the Gibbs sampler in action does not involve the posterior distribution for the parameters in a Bayesian analysis of some model.  Instead, we're going to look at the problem of obtaining the marginals of a bivariate normal distribution, when we know the form of the conditional distributions.

In other words - let's proceed one step at a time. The subsequent posts on this topic will be dealing with Bayesian posterior analysis.

Let's take a look at the set-up, and the analysis that we're going to undertake.

Suppose that we have two random variables, Y1 and Y2, which follow a bivariate normal distribution with a mean vector, (μ1 , μ2)', and a correlation of ρ. The variances of Y1 and Y2 will be σ12 and σ22 respectively, and so the covariance between Y1 and Y2 will be (ρσ1σ2).

Now, it's well known that the conditional distribution of Y1 given Y2 is

p(y1 | y2) ~ N[μ1 + ρσ1(y2 - μ2) / σ2  ,  σ12(1 - ρ2)]  ,

and the conditional distribution of Y2 given Y1 is

p(y2 | y1) ~ N[μ2 + ρσ2(y1- μ1) / σ1   , σ22(1 - ρ2)]  .

We'll take this information as given, and we'll use it (via the Gibbs sampler) to obtain the marginal distributions of Y1 and Y2.

Now, in fact these marginal distributions are known to be

p(y1) ~ N[μ1 , σ12]
and
p(y2) ~ N[μ2 , σ22] .

However, we're going to pretend that we don't know this. It means, though, that we can check our MCMC results for accuracy at the end of the exercise.

Without any loss of generality, I'm going to set μ1 = μ2 = 0; and σ1 = σ2 = 1. Regardless of the value of ρ, this means that the marginal distributions for Y1 and Y2 should each be standard normal.

Now, let's recall the steps involved in using the Gibbs sampler in this context. What we'll do is:
1. Choose an initial value for Y1. Let's call this value y1(0).
2. Next, generate a random Y2 value (say, y2(1) ) from p(y2 | y1(0)).
3. Then, generate a new random Y1 value (say, y1(1)) from p(y1 | y2(1)).
4. Repeat steps 2 and 3 many, many times, saving the strings of Y1 and Y2 values.
5. Throw away the first thousand or so values from these strings of Y1 and Y2 values, as they are draws from the conditional distributions.
6. After these "burn in" values have been discarded, the subsequent strings should contain thousands of values from the marginal distributions of Y1 and Y2. These are the values that we're interested in.
Of course, we have to allow for a sufficiently long "burn in period" for this to work effectively. This is something that we can return to in a later post.

So, let's take a look at my R code. It starts off as follows, by  initializing some quantities that we'll need:

After assigning an initial value for Y1 (i.e., step 1 above), we run through the Gibbs sampler loop, and store all of the values drawn from the conditional distributions for Y1 and Y2:

Next, we throw away the "burn in" observations, and just focus on the next 100,000 values of each of Y1 and Y2:

Let's plot these values for Y1:

This is what we get:

Notice that the values are centered around a level of zero, and very few values fall above 3, or below -3. This is what we'd expect if the distribution is standard normal, which is what the marginal distribution for Y1 is supposed to be.

Now repeat this for Y2:

Once again, the results are what you'd expect from a standard normal distribution.
This is a good start, but we need to explore these values more in more detail. Let's look at some summary statistics. The code, results in:

The means of the Y1 and Y2 values are essentially zero, and the variances are essentially unity. That's what they should be if these draws are from the marginal distributions, and not from the conditional distributions.

Next, let's look at histograms of the Y1 and Y2 data:

Here are the results:

I'm almost convinced that the 100,000 values for Y1 and those for Y2, come from standard normal distributions.
Let's nail this with a couple of Q-Q plots and Jarque-Bera tests. (The latter require that we have installed, and call up, the "tseries" package in R.)

The Q-Q plots look great, and the p-values associated with the Jarque-Bera tests very strongly suggest that we can't reject normality. And recall that the means an variances lined up with what they should be.

I'm convinced, and I hope that you are too! Our Gibbs sampler seems to have successfully generated 100,000 draws from the marginal distributions for Y1 and Y2.

Once again, the full set of R commands is on the code page for this blog. Some subsequent posts will provide examples of applying the Gibbs sampler to determine the marginal posterior distributions for the parameters in some Bayesian problems.

1. Nice illustration of MCMC!
Using the 'fitDistr' function of my 'propagate' package which fits 21 continuous distributions to data, I get the following (lowest AIC indicating best fitted distribution):

library(propagate)
fitDistr(as.numeric(yy1b))\$aic

Distribution AIC
3 Generalized normal -1670.0778
2 Skewed-normal -1669.9984
1 Normal -1668.7873
17 Johnson SB -1668.0871
16 Johnson SU -1668.0185
19 4P Beta -1666.9396
5 Scaled/shifted t- -1666.7934
18 3P Weibull -1532.8170
11 Generalized Trapezoidal -1471.0355
6 Logistic -1308.7140
9 Trapezoidal -1245.7064
8 Triangular -1226.8029
21 von Mises -1194.2546
15 Gumbel -972.6333
14 Laplace -881.2294
13 Cauchy -774.7329
10 Curvilinear Trapezoidal -728.3053
7 Uniform -449.5577
20 Arcsine -313.5556
12 Gamma -305.0387
4 Log-normal NA

The first three winners are variants of the normal distribution, which supports your findings!

Cheers,
Andrej

2. Why in the second and third step the conditional distribution for Y2 and Y1 are all equal to sd? I don't see it from the formula you gave above.

1. Because, as I stated, sigma1^2 and sigma2^2 have each been set to 1.

2. Thanks for the reply. But we don't know they are equal to 1 at first, and they are just sd. Am I wrong?

3. If you look carefully the ENTIRE code is written specifically for the case where mu1=-mu2=0, as well as sigma1=sigma2=1. Just look at teh lines where Y1 and Y2 are generated. You can generalize this if you want.

4. You mean: y2 <- rnomr(1, 0, sd) + rho*y1? sd is not 1 in this case, right?

5. Exactly. Thee means 1 observation, the zero is for zero mean, and sd is defined above. Then we add the term to shift the mean.

6. You set sd = sqrt(1-rho^2), right?

7. Yes - just look at the last line in the first block of code in the post itself. It's right there!

3. very informative. Thanks.