Monte Carlo (MC) simulation provides us with a very powerful tool for solving all sorts of problems. In classical econometrics, we can use it to explore the properties of the estimators and tests that we use. More specifically, MC methods enable us to mimic (computationally) the sampling distributions of estimators and test statistics in situations that are of interest to us. In Bayesian econometrics we use this tool to actually construct the estimators themselves. I'll put the latter to one side in what follows.
To get the discussion going, suppose that we know that a particular test statistic has a sampling distribution that is Chi Square asymptotically - that is when the sample size is infinitely large - when the null hypothesis is true. That's helpful, up to a point, but it provides us with no hard evidence about that sampling distribution if the sample size that we happen to be faced with is just n = 10 (say). And if we don't know what the sampling distribution is, then we don't know what critical region is appropriate when we apply the test.
To get the discussion going, suppose that we know that a particular test statistic has a sampling distribution that is Chi Square asymptotically - that is when the sample size is infinitely large - when the null hypothesis is true. That's helpful, up to a point, but it provides us with no hard evidence about that sampling distribution if the sample size that we happen to be faced with is just n = 10 (say). And if we don't know what the sampling distribution is, then we don't know what critical region is appropriate when we apply the test.
In the same vein, we usually avoid using estimators that are are "inconsistent". This implies that our estimators are (among other things) asymptotically unbiased. Again, however, this is no guarantee that they are unbiased, or even have acceptably small bias, if we're working with a relatively small sample of data. If we want to determine the bias (or variance) of an estimator for a particular finite sample size (n), then once again we need to know about the estimator's sampling distribution. Specifically, we need to determine the mean and the variance of that sampling distribution.
And there's no "magic number" above which the sample size can be considered large enough for the asymptotic results to hold. It varies from problem to problem, and often it depends on the unknown parameters that we're trying to draw inferences about. In some situations a sample size of n = 30 will suffice, but in other cases thousands of observations are needed before we can appeal to the asymptotic results.
And there's no "magic number" above which the sample size can be considered large enough for the asymptotic results to hold. It varies from problem to problem, and often it depends on the unknown parameters that we're trying to draw inferences about. In some situations a sample size of n = 30 will suffice, but in other cases thousands of observations are needed before we can appeal to the asymptotic results.
If we can't figure the details of the sampling distribution for an estimator or a test statistic by analytical means - and sometimes that can be very, very, difficult - then one way to go forward is to conduct some sort of MC simulation experiment. Indeed, using such simulation techniques hand-in-hand with analytical tools can be very productive when we're exploring new territory.
I'll be devoting several upcoming posts to this important topic. In those posts I'll try and provide an easy-to-follow introduction to MC simulation for econometrics students.
Before proceeding further, let's take a brief look at some of the history of MC simulation. After all, knowing where it came from may be enlightening when we deal with its details and consider its strengths and weaknesses. And a bit of history is good for all of us! Right?
Some of the first applications of what we'd now call MC simulation were suggested in the context of geometric probability. A classic problem would be the problem of getting a good approximation to the value of pi. The standard example that's cited as a way of doing this probabilistically is the so-called Buffon's Needle experiment. This is generally attributed to Georges-Louis Leclerc, Comte de Buffon, in the 18th Century. The experiment involves dropping a needle (or matchstick, say) on to a sheet of paper on which parallel lines have been drawn. The proportion of times that the needle crosses one of the lines, together with the needle length and the width between the lines, can be manipulated to "simulate" the value of π. I won't go into further details here, but a simple explanation can be found here, along with a nice interactive simulator that you can play around with to see how well this works.
When I'm first introducing my undergraduate class to MC simulation I always use another solution to this problem of determining the value of π. It involves a circle inscribed in a square, and random drawing of coordinates on the square. I've described it in detail in this earlier post and you may want to take a look at it. (It turns out that this can be viewed as an example of MC integration of a function, but I won't digress.)
Students seem to like this illustration - it convinces them that there may be something too this MC simulation thing!
Students seem to like this illustration - it convinces them that there may be something too this MC simulation thing!
It seems that the origins of "modern" MC analysis can be traced to the scientists who were associated with the "Manhattan Project" and the subsequent research into nuclear physics at the Los Alamos National Laboratory. Some of the mathematical luminaries who were part of that team in the post-WWII period included John von Neumann, Stan Ulam, and Nicholas Metropolis. The term, "Monte Carlo" was apparently first suggested by von Neumann in reference to the famous casino town in Monaco. (Remember James Bond and Casino Royale?) A secret code name was needed for the new simulation technique, and this name was chosen because Ulam had an uncle who loved to frequent the Monte Carlo casinos. Metropolis (1987) provides an interesting account of the history of these activities at the Los Alamos Laboratory, and the connections between these and the development of the first serious computers, such as ENAIC.
One of the big challenges facing these and other scientists (such as Enrico Fermi, who was also at Los Alamos at that time) was how to accurately evaluate complex, non-standard, multi-dimensional integrals. The latter arose as the solutions to the differential equations that the physicists were using to model thermo-nuclear reactions. See Metropolis and Ulam (1949).
Integrals - yuk!
But bear with me, and you'll see why this is actually interesting!
For simplicity let's suppose that we want to evaluate the following integral:
θ = ∫ f(x)dx, where x is a scalar variable, and the range of integration is 0 < x < 1.
We'd probably be able to figure out a way to solve this integration problem analytically. Even if we couldn't evaluate this integral analytically, because it's only one-dimensional (and over a finite range), standard numerical quadrature methods, such as various generalization of Simpson's rule can be used to get a really accurate approximation very easily. However, once we move to multi-dimensional integrals the story changes dramatically. Numerical quadrature quickly becomes infeasible, computationally, as the dimensionality of the problem increases.
So, if you can't get an analytical solution, you're kind of stuck.
Enter the team at Los Alamos!
They figured out how to simulate the integral - even in the nasty multi-dimensional case - by using random numbers and the statistical results that we associate with the various Central Limit Theorems and Laws of Large Numbers.
Let's continue with the simple integral, θ, given above. It will be useful as a device for illustrating the basic idea. Before we do so, notice one thing. In that expression the range of integration was 0 < x < 1. That's not really restrictive. If the range was a < x < b, we'd just make the change of variable,
y = (x - a) / (b - a), so that 0 < y < 1.
We then have
θ = (b - a) ∫ f[a + (b - a)y]dy, or θ = ∫ g(y)dy.
So, let's stick to a range of integration of zero to one, and this will help because I'm going to introduce the simplest of all continuous random numbers - ones that vary uniformly on the zero-one interval.
Specifically, let U be such a continuous random variable, so its density is p(u) = 1, 0 < u < 1; = 0, otherwise. Now, notice that if we wanted to estimate the integral, theta, a natural starting point would be to draw a random u, and use as our estimator, θ* = f(u). Why is this a natural starting point? Well, for a start, theta* is an unbiased estimator of the integral, θ.
To see this, just note that E[θ*] = E[f(u)] = ∫ f(u)p(u)du = ∫ f(u)du = ∫ (f(x)dx = θ. In each case, the range of integration is from zero to one.
(The penultimate equality in this last expression follows because u and x are just labels - we can use any label for the variable of integration.)
So, as long as we know how to obtain (generate) a random value on the zero-one interval, we can get an unbiased estimator very easily. As an aside, methods of generating such (pseudo) random numbers was another problem that the Los Alamos team grappled with.
One of the earliest methods that von Neumann came up with was the (not-so-successful) "middle-square" algorithm. But that's a story for another post.These days, of course, anyone with access to a computer can generate such numbers at will.
Now, although θ* is an unbiased estimator of θ, you might guess that it still isn't that great. After all, it's based on just one piece of random information. The values that theta* takes (i.e., the estimates of the integral) can vary dramatically as we draw different values of u. In our parlance, this will be a very "inefficient" estimator. You can probably guess already what we're going to do next!
We're going to repeat the exercise several times. We'll draw several (N) independent values of u, and in each case we'll construct θ* as above.Then we'll take the average of the N different estimates that we've obtained.
Averaging will reduce the variability and will result in a more efficient estimator. If N is big enough then I'm sure you know already that we can appeal to the Central Limit Theorem and the Law of Large Numbers to guarantee that our empirical average will converge (loosely speaking) to θ - the integral that we're interested in.
How fast that convergence takes place may still be an issue. We may need a very large value of N to get reasonable accuracy, and this may be expensive, computationally. There are various ways that we can speed up this process, and this is something that will be covered in a later post on MC simulation.
This basic idea extends naturally to the case of nasty, multi-dimensional, integrals.
O.K., so now we see that we can use statistical methods, and a computer, to evaluate integrals. What's the connection between this and the sort of econometric problems that I mentioned at the start of this post?
Well, the connection is really tight. If we want to determine the bias of an estimator, we need to compare the mean of its sampling distribution with the true value of the parameter that we're trying to estimate.
To get the mean of the sampling distribution we have to evaluate an integral. Aha! Similarly, if we want to determine the efficiency of an estimator, we have to obtain its variance, and that also involves obtaining a particular expected value, and hence evaluating a (different) integral, and so on.
It looks as if MC simulation may have quite a bit to offer us if we want to explore the properties of the various estimators and test statistics that we use. I'll be discussing this in much more detail, but still in an introductory manner, in a serious of posts over the next short while.
References:
One of the big challenges facing these and other scientists (such as Enrico Fermi, who was also at Los Alamos at that time) was how to accurately evaluate complex, non-standard, multi-dimensional integrals. The latter arose as the solutions to the differential equations that the physicists were using to model thermo-nuclear reactions. See Metropolis and Ulam (1949).
Integrals - yuk!
But bear with me, and you'll see why this is actually interesting!
For simplicity let's suppose that we want to evaluate the following integral:
θ = ∫ f(x)dx, where x is a scalar variable, and the range of integration is 0 < x < 1.
We'd probably be able to figure out a way to solve this integration problem analytically. Even if we couldn't evaluate this integral analytically, because it's only one-dimensional (and over a finite range), standard numerical quadrature methods, such as various generalization of Simpson's rule can be used to get a really accurate approximation very easily. However, once we move to multi-dimensional integrals the story changes dramatically. Numerical quadrature quickly becomes infeasible, computationally, as the dimensionality of the problem increases.
So, if you can't get an analytical solution, you're kind of stuck.
Enter the team at Los Alamos!
They figured out how to simulate the integral - even in the nasty multi-dimensional case - by using random numbers and the statistical results that we associate with the various Central Limit Theorems and Laws of Large Numbers.
Let's continue with the simple integral, θ, given above. It will be useful as a device for illustrating the basic idea. Before we do so, notice one thing. In that expression the range of integration was 0 < x < 1. That's not really restrictive. If the range was a < x < b, we'd just make the change of variable,
y = (x - a) / (b - a), so that 0 < y < 1.
We then have
θ = (b - a) ∫ f[a + (b - a)y]dy, or θ = ∫ g(y)dy.
So, let's stick to a range of integration of zero to one, and this will help because I'm going to introduce the simplest of all continuous random numbers - ones that vary uniformly on the zero-one interval.
Specifically, let U be such a continuous random variable, so its density is p(u) = 1, 0 < u < 1; = 0, otherwise. Now, notice that if we wanted to estimate the integral, theta, a natural starting point would be to draw a random u, and use as our estimator, θ* = f(u). Why is this a natural starting point? Well, for a start, theta* is an unbiased estimator of the integral, θ.
To see this, just note that E[θ*] = E[f(u)] = ∫ f(u)p(u)du = ∫ f(u)du = ∫ (f(x)dx = θ. In each case, the range of integration is from zero to one.
(The penultimate equality in this last expression follows because u and x are just labels - we can use any label for the variable of integration.)
So, as long as we know how to obtain (generate) a random value on the zero-one interval, we can get an unbiased estimator very easily. As an aside, methods of generating such (pseudo) random numbers was another problem that the Los Alamos team grappled with.
One of the earliest methods that von Neumann came up with was the (not-so-successful) "middle-square" algorithm. But that's a story for another post.These days, of course, anyone with access to a computer can generate such numbers at will.
Now, although θ* is an unbiased estimator of θ, you might guess that it still isn't that great. After all, it's based on just one piece of random information. The values that theta* takes (i.e., the estimates of the integral) can vary dramatically as we draw different values of u. In our parlance, this will be a very "inefficient" estimator. You can probably guess already what we're going to do next!
We're going to repeat the exercise several times. We'll draw several (N) independent values of u, and in each case we'll construct θ* as above.Then we'll take the average of the N different estimates that we've obtained.
Averaging will reduce the variability and will result in a more efficient estimator. If N is big enough then I'm sure you know already that we can appeal to the Central Limit Theorem and the Law of Large Numbers to guarantee that our empirical average will converge (loosely speaking) to θ - the integral that we're interested in.
How fast that convergence takes place may still be an issue. We may need a very large value of N to get reasonable accuracy, and this may be expensive, computationally. There are various ways that we can speed up this process, and this is something that will be covered in a later post on MC simulation.
This basic idea extends naturally to the case of nasty, multi-dimensional, integrals.
O.K., so now we see that we can use statistical methods, and a computer, to evaluate integrals. What's the connection between this and the sort of econometric problems that I mentioned at the start of this post?
Well, the connection is really tight. If we want to determine the bias of an estimator, we need to compare the mean of its sampling distribution with the true value of the parameter that we're trying to estimate.
To get the mean of the sampling distribution we have to evaluate an integral. Aha! Similarly, if we want to determine the efficiency of an estimator, we have to obtain its variance, and that also involves obtaining a particular expected value, and hence evaluating a (different) integral, and so on.
It looks as if MC simulation may have quite a bit to offer us if we want to explore the properties of the various estimators and test statistics that we use. I'll be discussing this in much more detail, but still in an introductory manner, in a serious of posts over the next short while.
References:
Metropolis, N., 1987. The beginning of the Monte Carlo method. Los Alamos Science, Special Issue, 125-130.
Metropolis, N. & S. Ulam, 1949. The Monte Carlo method. Journal of the American Statistical Association, 44, 335-341.
Metropolis, N. & S. Ulam, 1949. The Monte Carlo method. Journal of the American Statistical Association, 44, 335-341.
Thanks, very helpful.
ReplyDeleteVery interesting indeed. Wish i could actually dig deeper in this topic instead of having to write a summary!
ReplyDelete