Friday, July 29, 2011

Galton Centenary

The words "regression" and "correlation" trip off our tongues on a daily basis - if not more frequently. Both of them can be attributed to the British polymath, Sir Francis Galton (1822 - 1911). I've blogged a little bit about Galton rpreviously, in The Origin of Our Species.

To commemorate the centenary of his death on 17 January 1911, statisticians are honouring Galton's impressive contributions this year. Putting aside Galton's promotion of eugenics, there is still much to celebrate. Perhaps the most comprehensive source of information about his work and influence is at http://galton.org/. This site includes, among other things, copies of all of his published work - much of which is difficult to obtain elsewhere these days.

Not surprisingly, the Royal Statistical Society has been paying special to Galton this year. Among other things there have been some interesting items in their Significance magazine. I'd especially recommend the pieces by Graham Wheeler, Tom Fanshawe and Julian Champkin. In the last of these, look for the link to a BBC radio talk on Galton, by Steve Jones of the Galton Laboratory at University College London!

Finally, if you're looking for inspiration - and who isn't(!) - Galton's own account of his discovery of correlation and regression (originally termed "reversion") makes interesting reading. Titled "Kinship and Regression", you can find it here.


© 2011, David E. Giles

Thursday, July 28, 2011

Moving Average Errors


"God made X (the data), man made all the rest (especially ε, the error term)."
Emanuel Parzen



A while back I was asked if I could provide some examples of situations where the errors of a regression model would be expected to follow a moving average process. 

Introductory courses in econometrics always discuss the situation where the errors in a model are correlated, implying that the associated covariance matrix is non-scalar. Specifically, at least some of the off-diagonal elements of this matrix are non-zero. Examples that are usually mentioned include: (a) the errors follow a stationary first-order autoregressive (i.e., AR(1)) process; and (b) the errors follow a first-order moving average (i.e., MA(1)) process. Typically, the discussion then deals with tests for independence against a specific alternative process; and estimators that take account of the non-scalar covariance matrix - e.g., the GLS (Aitken) estimator.

It's often easier to motivate AR errors than to think of reasons why MA errors may arise in a regression model in practice. For example, if we're using economic time-series data and if the error term reflects omitted effects, then the latter are likely to be trended and/or cyclical. In each case, this gives rise to an autoregressive process. The omission of a seasonal variable will general imply errors that follow an AR(4) process; and so on.

However, let's think of some situations where the MA regression errors might be expected to arise.

Monday, July 25, 2011

Maximum Likelihod Estimation is Invariably Good!

In a recent post I talked a bit about some of the (large sample) asymptotic properties of Maximum Likelihood Estimators (MLEs). With some care in its construction, the MLE will be consistent, asymptotically efficient, and asymptotically normal.These are all desirable statistical properties.

Most of you will be well aware that MLEs also enjoy an important, and very convenient, algebraic property - we usually call it "invariance". However, you may not know that this property holds in more general circumstances than those that are usually mentioned in econometrics textbooks. I'll come to that shortly.

In case the concept of invariance is news to you, here's what this property is about. Let's suppose that the underlying joint data density, for our vector of data, y, is parameterized in terms of a vector of parameters, θ. The likelihood function (LF) is just the joint data density, p(y | θ) , viewed as if it is a function of the parameters, not the data. That is, the LF is L(θ | y) = p(y | θ). We then find the value of θ that (globally) maximizes L, given the sample of data, y.

That's all fine, but what if our main interest is not in θ itself, but instead we're interested in some function of  θ, say φ = f(θ)? For instance, suppose we are estimating a k-regressor linear regression model of the form:

y = + ε  ;  ε ~N[0 , σ2In] .

Here, θ' = (β' , σ2).  You'll know that in this case the MLE of β is just the OLS estimator of that vector, and the MLE of σ2 is the sum of the squared residuals, divided by n (not by n-k). The first of these estimators is minimum variance unbiased; while the second estimator is biased. Both estimators are consistent and "best asymptotically normal".

Now, what if we are interested in estimating the non-linear function, φ = f(θ) = (β1 + β2β3)?

Saturday, July 23, 2011

National Debt & Regression Models - Units of Measurement Matter!

In a recent post, titled "Debt and Delusion", Robert Shiller draws attention to a very important point amid the current bombardment of news about the debt crisis (crises?).

Referring to the situation in Greece, he comments:

 "Here in the US, it might seem like an image of our future, as public debt comes perilously close to 100% of annual GDP and continues to rise. But maybe this image is just a bit too vivid in our imaginations. Could it be that people think that a country becomes insolvent when its debt exceeds 100% of GDP?

That would clearly be nonsense. After all, debt (which is measured in currency units) and GDP (which is measured in currency units per unit of time) yields a ratio in units of pure time. There is nothing special about using a year as that unit. A year is the time that it takes for the earth to orbit the sun, which, except for seasonal industries like agriculture, has no particular economic significance.

We should remember this from high school science: always pay attention to units of measurement. Get the units wrong and you are totally befuddled.

If economists did not habitually annualize quarterly GDP data and multiply quarterly GDP by four, Greece’s debt-to-GDP ratio would be four times higher than it is now. And if they habitually decadalized GDP, multiplying the quarterly GDP numbers by 40 instead of four, Greece’s debt burden would be 15%. From the standpoint of Greece’s ability to pay, such units would be more relevant, since it doesn’t have to pay off its debts fully in one year (unless the crisis makes it impossible to refinance current debt)." .........

Friday, July 22, 2011

On the Importance of Going Global

Since being proposed by Sir Ronald Fisher in a series of papers during the period 1912 to 1934 (Aldrich, 1977), Maximum Likelihood Estimation (MLE) has been one of the "workhorses" of statistical inference, and so it plays a central role in econometrics. It's not the only game in town, of course, especially if you're of the Bayesian persuasion, but even then the likelihood function (in the guise of the joint data density) is a key ingredient in the overall recipe.

MLE provides one of the core "unifying themes" in econometric theory and practice. Many of the particular estimators that we use are just applications of MLE; and many of the tests we use are simply special cases of the core tests that go with MLE - the likelihood ratio; score (Lagrange multiplier), and Wald tests.

The statistical properties that make MLE (and the associated tests) appealing are mostly "asymptotic" in nature. That is, they hold if the sample size becomes infinitely large. There are no guarantees, in general, that MLEs will have "good" properties if the sample size is small. It will depend on the problem in question. So, for example, in some cases MLEs are unbiased, but in others they are not.

More specifically, you'll be aware that (in general) MLEs have the following desirable large-sample properties - they are:
  • (At least) weakly consistent.
  • Asymptotically efficient.
  • Asymptotically normally distributed.

Just what does "in general" mean here? ..........

Wednesday, July 20, 2011

So Much For My Bucket List!

Forty two years ago today, on July 20, 1969 (20:17:40 UTC), Neil Armstrong stepped on to the surface of our moon.

I've had an ongoing interest in the space program since the early 1960's. I kind of grew up with it all. Then, in the summer of 1980, while attending the Joint Statistical Meetings in Houston, my friend Keith McLaren and I went on a tour of the Johnson Space Center.

Several things stand out when I think back to that visit.
  • The Apollo 11 capsule was unbelievably small, and the ceramic heat shield was burned almost right through!
  • The Mission Control room was also incredibly small! (The guide said that was everyone's first reaction.)
  • We went inside a "mock-up" of the space shuttle.
  • We walked along a catwalk above a barn of a room that was totally full of more IBM mainframe computers than you can imagine. They were part way through a 12 month long simulation in preparation for the first shuttle flight the following year.
Not surprisingly, then, one item on my bucket list was to see a shuttle launch.

Mapping the Flow of Scientific Knowledge

If you have an interest in the flow of scientific knowledge, especially across different disciplines, then you'll enjoy the Eigenfactor.org site. It provides some terrific graphical analyses of the map ("graph") of the world of scientific citations.

One thing that you'll get an insight into is the position of economics, as a discipline, relative to other sciences.

You can also use the site to get a slightly different "take" on the rankings of economics and econometrics journals, based on factors that aren't taken into account in simple citation counting. I'm referring to the so-called Eigenfactor Score.

Make sure that you check out the tabs labelled "mapping" and "well-formed".

Moritz Stefaner is responsible for the "well-formed" visualization, and he blogs here. You've seen his very creative work before, in connection with the OECD's Better Life Index, which I've discussed previously, here, here, and  here.

Enjoy!
© 2011, David E. Giles


Saturday, July 16, 2011

A Bayesian and Non-Bayesian Marriage

There's no doubt that the unholy rift between Bayesian and non-Bayesian statisticians, and econometricians, is a thing of the past. And thank goodness for that, because both parties have so much to offer each other.

It wasn't that long ago that it was a case of "them or us" - you had to take sides. That was pretty much the case when I wrote my Ph.D. dissertation in Bayesian econometrics in the early 1970's. These days, most of us are pretty comfortable about using a blend of principles and techniques that are drawn from both camps. It's all about taking a flexible approach, and recognizing that sometimes you need more than one set of tools in your tool box.

Historically, we had the school I'll loosely call the "frequentists" in the blue corner, and the Bayesians in the red corner. I'm not going to try and list the key players in each group. However, the big name that comes to mind in the blue corner is Sir Ronald A. Fisher, who gave us the concept of the "likelihood function", and maximum likelihood estimation.

In the red corner, George E. P. Box has to be numbered as one of the most influential Bayesian statisticians, though his statistical interests were quite broad. Born in England, he later moved to the U.S.A., served as Director of the Statistical Research Group at Princeton University, and  founded the Department of Statistics at the University of Wisconsin (Madison) in 1960.

Apart from anything else, econometricians will know George Box from Box-Jenkins time-series analysis, and the Box-Cox transformation. In addition, George penned the oldest song, "There's No Theorem Like Bayes' Theorem",  in The Bayesian Songbook. See my earlier post on this here.

So, what's the interesting connection between Fisher and Box? Well, among the many professional awards that Box received was  the A.S.A's R. A. Fisher Lectureship in 1974.

But it gets better. Box married Joan Fisher, the second of Fisher's five daughters.

I like that! 


© 2011, David E. Giles

Wednesday, July 13, 2011

After-Dinner Talks to Die For

We all know that things aren't always what they appear to be. The same is true of people. A well known experiment from 1970 provides an interesting and entertaining illustration of this.

In the web-only content of the latest issue of Significance magazine (co-published by The American Statistical Association and The Royal Statistical Society), Mikhael Simkin summarizes the story of this experiment. What's new about his piece is that it gives us a link to a video  of the fake lecture that formed the basis of the experiment at the University of Southern California School of Medicine.

The video is worth watching, especially if you have even the slightest knowledge of game theory! I won't say more - just see for yourselves.

I used to make myself available, through my university's speakers' bureau, to give talks to to outside groups and organizations. Lions Clubs; the Sons of Norway; Chartered Accountants Supporting Whales and Dolphins; the South Nanaimo Spinning, Weaving and Knitting Guild; and so on. The trouble was that when I was invited to talk on (my then interest) "The Underground Economy in Canada", all they really wanted to know about was how they could get involved in the fun, preferably without being caught. (Especially the Spinners and Weavers!)

However, maybe it's time to get on the speakers' circuit again. But not for free. After all, you get what you pay for - right?

I'll bet there's a good market out there for after-dinner talks on topics such as:
  • "What Every Dental Surgeon Should Know About Unit Roots"
  • "Recent Developments in Cointegration Analysis for Recent Immigrants"
  • "Monte Carlo Simulation in the Post-Impressionist Art Movement" 
  • "Saddlepoint Approximations for Horse-Lovers"
  • "Spatial Econometrics for Realtors"
And if you think I'm kidding, I should point out that in my bookshelf I have a copy of nice book titled Statistics for Ornithologists. It was a gift some years ago from my friend, Ken White, developer of the SHAZAM econometrics package. If that doesn't offer some possibilities, I don't know what does.

So, if you belong to a club/group/society that's looking for that different, memorable, talk - you know where to find me!

© 2011, David E. Giles








Sunday, July 10, 2011

Prosperity, Thriving, & Economic Growth

A while ago I posted a couple of pieces (here and here) relating to the Better Life Index (BLI) that the OECD released in May of this year. Not surprisingly, the BLI caught the attention of a number of bloggers.

Some of the most thoughtful posts on this topic came from the Australian economist, Winton Bates. In the last few days Winton has extended his earlier analysis and comments in a series of posts that look at other, somewhat similar, indices.

These include the Legatum Prosperity Index, which Winton correlates with the BLI  here, and relates to GDP growth here. If you check these out you'll find links to his earlier posts on the BLI. 

In another interesting post (here), Winton asks "Does economic growth help people to thrive?" He uses Gallup's World Poll data on "thriving", "struggling" and "suffering", and relates it to per capita GDP in different countries. The Gallup data are interesting in their own right, being based on the Cantril self-striving anchoring scale, which you can learn more about here.

If you have an interest in these various measures of  "well being", and the linkages between them and standard measures of economic output and growth (e.g., GDP), Winton's blog, "Freedom and Flourishing", is definitely worth following.

© 2011, David E. Giles