Saturday, January 16, 2016

Why Does "Pi" Appear in the Normal Density

Every now and then a student will ask me why the formula for the density of a Normal random variable includes the constant, π, or more correctly (2π).

The answer is that this term ensures that the density function is "proper" - that is, the integral of the function over the full real line takes the value "1". The area under the density, or "total probability", is "1".

Some students are happy with this (partial) answer, but others want to see a proof. Fair enough!

However, there's a trick to proving that this integral (area) is "1" in value. Let's take a look at it.
Any student who's taken a course in statistical distribution theory will know about the trick. However, this rules out most econometrics students, so it's worth discussing it here.

To begin with, because we can always "standardize" a Normal variable that has an arbitrary mean and variance into one that has a zero mean and a unit variance, we can just focus on the latter random variable, say Z.

The formula for the density function of Z is

              p(z) = (2π) exp[-0.5z2]    ;       -∞ < z < ∞

Because (2π) is positive (we take the positive square root here), and the exponential of any value is also positive, the density function itself must be positive. You know that its plot looks like this:
Let A denote the full area under the density function. We want to show that A = 1. Given that A > 0, it will suffice to show that A2 = 1. This might sound like a harder task, but actually it's easier - and that's where the trick comes in.

We need to recall a bit of high school trigonometry. We're also going to use the fact that there's more than one "co-ordinate system" that can be used when locating a point on a plane.

Typically, we use the Cartesian co-ordinate system - courtesy of René Descartes. In this system we define an arbitrary origin, and we mark off X and Y axes that are orthogonal to each other and intersect at this origin:
Then, by going a distance x from the origin along the X axis, and a distance y along the Y axis from the origin, we can locate the unique point (x , y):

From there, we can plot functions of the form y = f(x), and so on. Nothing new about this, right?

Alternatively, we can locate any point on a plane by using the Polar co-ordinate system. See here for some information about the history of this system.

In this case, we again assign an origin, but then we choose an angle, θ, and a radius, r. In other words, we decide in which direction from the origin we're going to move, and how far we're going to go in that direction. Here's what I mean:
Any point that's defined by the Cartesian co-ordinates (x , y) can be defined equivalently by the Polar co-ordinates (r , θ).

Now if we place a circle on the plane in the last chart, you might start to think about the number, π. After all, the area of the circle is πr2, the circumference is 2πr, and if "sweep" the radius line all around the circle, the angle that we'll trace out will amount to 2π radians.


This looks promising!

So, let's go back to that (squared) area under the normal density. I'm going to write it as:

           A2 = [ ∫ (2π) exp(-½z12) dz1 ] [ ∫ (2π) exp(-½z22) dz2 ] ,       (1)   

where the ranges of integration are from -∞ to ∞ in each case.

(Because z1 and z2 are just the variables with respect to which we're integrating, we can give them any labels we like.)

Often, to evaluate an integral, we use a change of variable. Here, we have two variables, z1 and z2. I'm going to use the (dual) change of variables,

          z1 = r sin(θ)
          z2 = r cos(θ)                                                            (2)

You always knew that high school trigonometry would come in useful one day, didn't you?

Notice two things about the transformation given above in (2):

1.  It's one from the Cartesian co-ordinate system to the Polar co-ordinate system.

2.  It's between two quantities (r and θ) and two variables (z1 and z2).

The transformation needs to be "balanced" in this way. That's why we needed to work with A2, and not with just A itself. In the latter case we would have had only one z variable, but still two quantities that define the Polar co-ordinates.

When we make a change of variable in a simple integration problem, we have to keep track of two things - the derivative of the function that defines the transformation; and the range of integration itself.

The same thing applies here, with our two-equation transformation in (2).

More precisely, we need  to take into account the "Jacobian" of the mapping from (r , θ) to (z1 , z2).

This Jacobian (J) is defined in our case as the determinant of the (2 x 2) matrix whose elements (in the order (1,1), (1,2), (2,1), and (2,2)) are (∂z1/∂r), (∂z1/∂θ), (∂z2/∂r), and (∂z2/∂θ). These derivatives are:

        (∂z1/∂r) = sin(θ)        (∂z1/∂θ) = rcos(θ)
        (∂z2/∂r) = cos(θ)       (∂z2/∂θ) = -rsin(θ)

Now, remembering from high school that sin2(θ) + cos2(θ) = 1, we have J = - r.

The absolute value of the Jacobian needs to be used - recall that the area under the density is positive, and we mustn't alter that property.

So, p(r , θ) = r p(z1 , z2). As for the ranges of integration,  we have 0 < r <  ; and 0 < θ < 2π (radians).

With all of this mind, we have the following result from (1):

     A2 = ∫ (2π) exp(-½ r2sin2(θ)) ∫ (2π) exp(-½ r2cos2(θ)) r dr dθ ,  (3)

where the first range of integration is from 0 to 2π, and the second range of integration is form 0 to ∞ .

Again, using the result that sin2(θ) + cos2(θ) = 1, we can simplify (3):

    A2 = (1 / 2π)  ∫  ∫  r exp(- r2 / 2) dr  dθ =  ∫  r exp(- r2 / 2)dr.                (4)

(Again, in (4) the ranges of integration are from zero to 2π , and from 0 to ∞, respectively in the middle expression; and from 0 to ∞ in the final expression.

It's easy to show that this last integral in (4) takes the value "1". (Just make the change of variable, x = (r2/2), and note that the integral of exp(-x) from zero to infinity is "1".)

So, there we have it. A2 = 1, and so A = 1 as well, because we noted already that A > 0.

That's where that "magic" π term comes from in the formula for the density of a Normal random variable.

© 2016, David E. Giles

8 comments:

  1. Great! I've proved it to myself in the past (knowing the outline of the trick). Now can you prove that the real numbers exist given that counting numbers exist? I'm a skeptic. ;D

    ReplyDelete
  2. I have just had a thought looking at the density plot: it goes from -3.X to 3.x. This means the range of the distribution is 6.2X. This range does it have to do with 2 pi=6.28?

    ReplyDelete
    Replies
    1. No, the range of the Normal variable is from minus infinity to plus infinity. It's just that the area below the density is negligible once you get outside the range of roughly (-3.5 , 3.5).

      Delete
  3. I got your point, thanks. By "variable", I was hitting at the standardized variable but not the original data.

    ReplyDelete
    Replies
    1. The standardized variable also ranges over the FULL real line.

      Delete
  4. I remember that the first time I saw this change of coordinates trick in class, we all thought that whoever fist came up with it must have been pretty pleased with themselves. But then, whoever first proved that the normal density function has no analytic anti-derivative must have been even more pleased. It certainly surprised me that it was possible to solve the definite integral on the real line without first finding the anti-derivative.

    Joe Blitzstein has a fantastic exposition of this proof in his lectures for Statistics 110, starting around 31:55 here https://youtu.be/72QjzHnYvL0?list=PLLVplP8OIVc8EktkrD3Q8td0GmId7DjW0&t=1915.

    That URL starts at the relevant point, but it's really a pleasure to watch the whole lecture. In fact, it's a pleasure to watch all his lectures; highly recommended.

    ReplyDelete
    Replies
    1. Phil - thanks for this. I will definitely look at his lectures.

      Delete