Econometrics Beat: Dave Giles' Blog: 07/01/2011

Friday, July 29, 2011

Galton Centenary

The words "regression" and "correlation" trip off our tongues on a daily basis - if not more frequently. Both of them can be attributed to the British polymath, Sir Francis Galton (1822 - 1911). I've blogged a little bit about Galton rpreviously, in The Origin of Our Species.

To commemorate the centenary of his death on 17 January 1911, statisticians are honouring Galton's impressive contributions this year. Putting aside Galton's promotion of eugenics, there is still much to celebrate. Perhaps the most comprehensive source of information about his work and influence is at http://galton.org/. This site includes, among other things, copies of all of his published work - much of which is difficult to obtain elsewhere these days.

Not surprisingly, the Royal Statistical Society has been paying special to Galton this year. Among other things there have been some interesting items in their Significance magazine. I'd especially recommend the pieces by Graham Wheeler, Tom Fanshawe and Julian Champkin. In the last of these, look for the link to a BBC radio talk on Galton, by Steve Jones of the Galton Laboratory at University College London!

Finally, if you're looking for inspiration - and who isn't(!) - Galton's own account of his discovery of correlation and regression (originally termed "reversion") makes interesting reading. Titled "Kinship and Regression", you can find it here.

Thursday, July 28, 2011

Moving Average Errors

"God made X (the data), man made all the rest (especially ε, the error term)."

Emanuel Parzen

A while back I was asked if I could provide some examples of situations where the errors of a regression model would be expected to follow a moving average process.

Introductory courses in econometrics always discuss the situation where the errors in a model are correlated, implying that the associated covariance matrix is non-scalar. Specifically, at least some of the off-diagonal elements of this matrix are non-zero. Examples that are usually mentioned include: (a) the errors follow a stationary first-order autoregressive (i.e., AR(1)) process; and (b) the errors follow a first-order moving average (i.e., MA(1)) process. Typically, the discussion then deals with tests for independence against a specific alternative process; and estimators that take account of the non-scalar covariance matrix - e.g., the GLS (Aitken) estimator.

It's often easier to motivate AR errors than to think of reasons why MA errors may arise in a regression model in practice. For example, if we're using economic time-series data and if the error term reflects omitted effects, then the latter are likely to be trended and/or cyclical. In each case, this gives rise to an autoregressive process. The omission of a seasonal variable will general imply errors that follow an AR(4) process; and so on.

However, let's think of some situations where the MA regression errors might be expected to arise.

Maximum Likelihod Estimation is Invariably Good!

In a recent post I talked a bit about some of the (large sample) asymptotic properties of Maximum Likelihood Estimators (MLEs). With some care in its construction, the MLE will be consistent, asymptotically efficient, and asymptotically normal.These are all desirable statistical properties.

Most of you will be well aware that MLEs also enjoy an important, and very convenient, algebraic property - we usually call it "invariance". However, you may not know that this property holds in more general circumstances than those that are usually mentioned in econometrics textbooks. I'll come to that shortly.

In case the concept of invariance is news to you, here's what this property is about. Let's suppose that the underlying joint data density, for our vector of data, y, is parameterized in terms of a vector of parameters, θ. The likelihood function (LF) is just the joint data density, p(y | θ) , viewed as if it is a function of the parameters, not the data. That is, the LF is L(θ | y) = p(y | θ). We then find the value of θ that (globally) maximizes L, given the sample of data, y.

That's all fine, but what if our main interest is not in θ itself, but instead we're interested in some function of θ, say φ = f(θ)? For instance, suppose we are estimating a k-regressor linear regression model of the form:

y = Xβ + ε ; ε ~N[0 , σ²I_n] .

Here, θ' = (β' , σ²). You'll know that in this case the MLE of β is just the OLS estimator of that vector, and the MLE of σ² is the sum of the squared residuals, divided by n (not by n-k). The first of these estimators is minimum variance unbiased; while the second estimator is biased. Both estimators are consistent and "best asymptotically normal".

Now, what if we are interested in estimating the non-linear function, φ = f(θ) = (β₁ + β₂β₃)?

National Debt & Regression Models - Units of Measurement Matter!

In a recent post, titled "Debt and Delusion", Robert Shiller draws attention to a very important point amid the current bombardment of news about the debt crisis (crises?).

Referring to the situation in Greece, he comments:

"Here in the US, it might seem like an image of our future, as public debt comes perilously close to 100% of annual GDP and continues to rise. But maybe this image is just a bit too vivid in our imaginations. Could it be that people think that a country becomes insolvent when its debt exceeds 100% of GDP?

That would clearly be nonsense. After all, debt (which is measured in currency units) and GDP (which is measured in currency units per unit of time) yields a ratio in units of pure time. There is nothing special about using a year as that unit. A year is the time that it takes for the earth to orbit the sun, which, except for seasonal industries like agriculture, has no particular economic significance.

We should remember this from high school science: always pay attention to units of measurement. Get the units wrong and you are totally befuddled.

If economists did not habitually annualize quarterly GDP data and multiply quarterly GDP by four, Greece’s debt-to-GDP ratio would be four times higher than it is now. And if they habitually decadalized GDP, multiplying the quarterly GDP numbers by 40 instead of four, Greece’s debt burden would be 15%. From the standpoint of Greece’s ability to pay, such units would be more relevant, since it doesn’t have to pay off its debts fully in one year (unless the crisis makes it impossible to refinance current debt)." .........

On the Importance of Going Global

Since being proposed by Sir Ronald Fisher in a series of papers during the period 1912 to 1934 (Aldrich, 1977), Maximum Likelihood Estimation (MLE) has been one of the "workhorses" of statistical inference, and so it plays a central role in econometrics. It's not the only game in town, of course, especially if you're of the Bayesian persuasion, but even then the likelihood function (in the guise of the joint data density) is a key ingredient in the overall recipe.

MLE provides one of the core "unifying themes" in econometric theory and practice. Many of the particular estimators that we use are just applications of MLE; and many of the tests we use are simply special cases of the core tests that go with MLE - the likelihood ratio; score (Lagrange multiplier), and Wald tests.

The statistical properties that make MLE (and the associated tests) appealing are mostly "asymptotic" in nature. That is, they hold if the sample size becomes infinitely large. There are no guarantees, in general, that MLEs will have "good" properties if the sample size is small. It will depend on the problem in question. So, for example, in some cases MLEs are unbiased, but in others they are not.

More specifically, you'll be aware that (in general) MLEs have the following desirable large-sample properties - they are:

(At least) weakly consistent.
Asymptotically efficient.
Asymptotically normally distributed.

Just what does "in general" mean here? ..........

So Much For My Bucket List!

Forty two years ago today, on July 20, 1969 (20:17:40 UTC), Neil Armstrong stepped on to the surface of our moon.

I've had an ongoing interest in the space program since the early 1960's. I kind of grew up with it all. Then, in the summer of 1980, while attending the Joint Statistical Meetings in Houston, my friend Keith McLaren and I went on a tour of the Johnson Space Center.

Several things stand out when I think back to that visit.

The Apollo 11 capsule was unbelievably small, and the ceramic heat shield was burned almost right through!
The Mission Control room was also incredibly small! (The guide said that was everyone's first reaction.)
We went inside a "mock-up" of the space shuttle.
We walked along a catwalk above a barn of a room that was totally full of more IBM mainframe computers than you can imagine. They were part way through a 12 month long simulation in preparation for the first shuttle flight the following year.

Not surprisingly, then, one item on my bucket list was to see a shuttle launch.

Mapping the Flow of Scientific Knowledge

If you have an interest in the flow of scientific knowledge, especially across different disciplines, then you'll enjoy the Eigenfactor.org site. It provides some terrific graphical analyses of the map ("graph") of the world of scientific citations.

One thing that you'll get an insight into is the position of economics, as a discipline, relative to other sciences.

You can also use the site to get a slightly different "take" on the rankings of economics and econometrics journals, based on factors that aren't taken into account in simple citation counting. I'm referring to the so-called Eigenfactor Score.

Make sure that you check out the tabs labelled "mapping" and "well-formed".

Moritz Stefaner is responsible for the "well-formed" visualization, and he blogs here. You've seen his very creative work before, in connection with the OECD's Better Life Index, which I've discussed previously, here, here, and here.

Enjoy!

Saturday, July 16, 2011

A Bayesian and Non-Bayesian Marriage

There's no doubt that the unholy rift between Bayesian and non-Bayesian statisticians, and econometricians, is a thing of the past. And thank goodness for that, because both parties have so much to offer each other.

It wasn't that long ago that it was a case of "them or us" - you had to take sides. That was pretty much the case when I wrote my Ph.D. dissertation in Bayesian econometrics in the early 1970's. These days, most of us are pretty comfortable about using a blend of principles and techniques that are drawn from both camps. It's all about taking a flexible approach, and recognizing that sometimes you need more than one set of tools in your tool box.

Historically, we had the school I'll loosely call the "frequentists" in the blue corner, and the Bayesians in the red corner. I'm not going to try and list the key players in each group. However, the big name that comes to mind in the blue corner is Sir Ronald A. Fisher, who gave us the concept of the "likelihood function", and maximum likelihood estimation.

In the red corner, George E. P. Box has to be numbered as one of the most influential Bayesian statisticians, though his statistical interests were quite broad. Born in England, he later moved to the U.S.A., served as Director of the Statistical Research Group at Princeton University, and founded the Department of Statistics at the University of Wisconsin (Madison) in 1960.

Apart from anything else, econometricians will know George Box from Box-Jenkins time-series analysis, and the Box-Cox transformation. In addition, George penned the oldest song, "There's No Theorem Like Bayes' Theorem", in The Bayesian Songbook. See my earlier post on this here.

So, what's the interesting connection between Fisher and Box? Well, among the many professional awards that Box received was the A.S.A's R. A. Fisher Lectureship in 1974.

But it gets better. Box married Joan Fisher, the second of Fisher's five daughters.

I like that!

Wednesday, July 13, 2011

After-Dinner Talks to Die For

We all know that things aren't always what they appear to be. The same is true of people. A well known experiment from 1970 provides an interesting and entertaining illustration of this.

In the web-only content of the latest issue of Significance magazine (co-published by The American Statistical Association and The Royal Statistical Society), Mikhael Simkin summarizes the story of this experiment. What's new about his piece is that it gives us a link to a video of the fake lecture that formed the basis of the experiment at the University of Southern California School of Medicine.

The video is worth watching, especially if you have even the slightest knowledge of game theory! I won't say more - just see for yourselves.

I used to make myself available, through my university's speakers' bureau, to give talks to to outside groups and organizations. Lions Clubs; the Sons of Norway; Chartered Accountants Supporting Whales and Dolphins; the South Nanaimo Spinning, Weaving and Knitting Guild; and so on. The trouble was that when I was invited to talk on (my then interest) "The Underground Economy in Canada", all they really wanted to know about was how they could get involved in the fun, preferably without being caught. (Especially the Spinners and Weavers!)

However, maybe it's time to get on the speakers' circuit again. But not for free. After all, you get what you pay for - right?

I'll bet there's a good market out there for after-dinner talks on topics such as:

"What Every Dental Surgeon Should Know About Unit Roots"
"Recent Developments in Cointegration Analysis for Recent Immigrants"
"Monte Carlo Simulation in the Post-Impressionist Art Movement"
"Saddlepoint Approximations for Horse-Lovers"
"Spatial Econometrics for Realtors"

And if you think I'm kidding, I should point out that in my bookshelf I have a copy of nice book titled Statistics for Ornithologists. It was a gift some years ago from my friend, Ken White, developer of the SHAZAM econometrics package. If that doesn't offer some possibilities, I don't know what does.

So, if you belong to a club/group/society that's looking for that different, memorable, talk - you know where to find me!

Sunday, July 10, 2011

Prosperity, Thriving, & Economic Growth

A while ago I posted a couple of pieces (here and here) relating to the Better Life Index (BLI) that the OECD released in May of this year. Not surprisingly, the BLI caught the attention of a number of bloggers.

Some of the most thoughtful posts on this topic came from the Australian economist, Winton Bates. In the last few days Winton has extended his earlier analysis and comments in a series of posts that look at other, somewhat similar, indices.

These include the Legatum Prosperity Index, which Winton correlates with the BLI here, and relates to GDP growth here. If you check these out you'll find links to his earlier posts on the BLI.

In another interesting post (here), Winton asks "Does economic growth help people to thrive?" He uses Gallup's World Poll data on "thriving", "struggling" and "suffering", and relates it to per capita GDP in different countries. The Gallup data are interesting in their own right, being based on the Cantril self-striving anchoring scale, which you can learn more about here.

If you have an interest in these various measures of "well being", and the linkages between them and standard measures of economic output and growth (e.g., GDP), Winton's blog, "Freedom and Flourishing", is definitely worth following.

Saturday, July 9, 2011

Econometrics Without Borders?

We're all familiar with the "Doctors Without Borders" organization, and the valuable international medical work that it performs. Perhaps you didn't know that there's also a group called "Statistics Without Borders"? To quote from their web site, their mission statement is as follows:

"Statistics Without Borders (SWB) is an apolitical organization under the auspices of the American Statistical Association, comprised entirely of volunteers, that provides pro bono statistical consulting and assistance to organizations and government agencies in support of these organizations' not-for-profit efforts to deal with international health issues (broadly defined)."

It's great to see The American Statistical Association, which I've been a member of for 38 years, supporting this type of venture.

Now, a new initiative, provisionally called "Data Without Borders" (DWB), has been established by data scientist, Jake Porway. You can read about it on Porway's web site, of course, and also in a recent post on The Guardian's DataBlog here. Briefly, the aim is to match important data from not-for-profit organizations with experts in the analysis of data. According to a recent item in the Royal Statistical Society's newsletter, RSSeNEWS, there were over 300 expressions of interest, internationally, within the first 24 hours of DWB being announced.

So, don't let anyone ever tell you that being a "quant" who works with data can't be socially meaningful. Even econometricians can make a difference if we want to!

Friday, July 8, 2011

Choosing Your Co-Authors Carefully

The choice of your co-authors in academic work can be very important - you all have to be able to bring something to the table, and hopefully there'll be enough synergies to ensure that the paper (or book) is better than any of you could have achieved individually.

I must say that I've certainly been extremely fortunate with my own past and current co-authorship "liasons".

There's plenty that could be said about deciding on the order of the authors - but I'll leave that for another post. That being said, there may be some other interesting considerations.

I was recently re-reading Stephen Hawking's A Brief History of Time. In Chapter 6 Hawking has a great story about an important quantum physics paper published in Physical Review in 1948. Two of the authors were George Gamow and his Ph.D. student, Ralph Alpher, at George Washington University. Gamow, persuaded the renowned physicist Hans Bethe to join them as a co-author, simply so that the final line-up of authors was Alpher, Bethe and Gamow.

A much more detailed account of the background to the "Alpha-Beta-Gamma" paper is provided by Simon Singh, in his book Big Bang: The Origin of the Universe. The paper, titled simply 'The Origin of Chemical Elements' was (Singh, p.323) "...a milestone in the Big Bang versus eternal universe debate", and "....the first major triumph for the Big Bang model since Hubble had observed and measured the redshifts of galaxies." (Singh, p.319).

Unfortunately, there was a darker side to this story that I was unaware of until Jeremy Austin brought it to my attention in a comment to the original version of this post (see below).

However, it certainly made me think about some possibilities for interesting co-authorships in other disciplines.

For instance, ........ they lived centuries apart, and their contributions to pure mathematics don't really overlap, but wouldn't it be nice if we could time-travel and get (Fields Medal winner) Klaus Roth to team up with Michel Rolle. A kind of mathematical jam session!

Moving closer to home, I see that both Yixiao Sun and Hyungsik Roger Moon have (separately) co-authored papers with Peter Phillips. However, it seems the first two of these econometricians haven't taken the opportunity to co-author a paper together. Pity - I'd enjoy that!

And what about trying to persuade Fallaw Sowell and John Pepper to cook up some interesting econometrics for us? Or perhaps my friend John Knight could collaborate with Rohit Deo? They'd undoubtedly produce a paper that was the epitomy of econometric clarity.

So, whatever your last name is, choose your co-authors carefully, and maybe even have a bit of fun in the process!

Thursday, July 7, 2011

Alexander Aitken

Can you imagine what it would be like trying to learn and teach econometrics without the use of matrix algebra? O.K., I know that some of you are probably thinking, "that would be great!" But give it some serious thought. We'd be extremely limited in what we could do. It would be a nightmare to go beyond the absolute basics.

Only in the 1960's, with the classic texts by Johnston (1963) and Goldberger (1964), did the use of matrix algebra become standard practice in the teaching of econometrics. We used that first edition of Johnston's text in the first undergraduate econometrics course I took. Thank goodness!

Every student of econometrics is indebted to Alexander Craig Aitken (1895 - 1967) for his development of what is now the standard vector/matrix notation for the linear regression model (and its extensions). Econometricians also use the Generalised Least Squares ("Aitken") estimator when this model has a non-standard error covariance matrix.

The seminal Generalised Least Squares contribution, together with the first matrix formulation of the linear regression model appeared in Aitken's paper, "On Least Squares and Linear Combinations of Observations", Proceedings of the Royal Society of Edinburgh, 1935, vol. 55, pp. 42-48. In this paper we find the well-known extension of the Gauss-Markhov Theorem to the case where the regression error vector has a non-scalar covariance matrix - the Aitken estimator is shown to be "Best Linear Unbiased".

Aitken's most influential statistical paper was co-authored with (another New Zealander) Harold Silverstone - "On the Estimation of Statistical Parameters", Proceedings of the Royal Society of Edinburgh, 1942, vol. 61, pp. 186-194. This paper extends earlier ideas by Sir Ronald Fisher to derive (only for the unbiased case) the result that we now usually refer to as the "Cramér-Rao Inequality" for the lower bound on the variance of an estimator. Interestingly, this contribution pre-dates the 1945 work by Rao and Cramér's 1946 paper.

So who was Alexander Craig Aitken?

The Econometric Game

If you're not familiar with The Econometric Game, you might find it interesting. It's a great concept, and it's become an important international event for graduate students in Econometrics.

I was especially pleased when a team from my old department at Monash University, in Melbourne, Australia, won the Game in 2010.

Saturday, July 2, 2011

One Good Turn Deserves Another

A couple of days ago I received an email from a Ph.D. student in the U.K.. I don't know him, or his supervisor, but the message came with a simple enough request. The student was having trouble getting hold of a copy of a paper published as a chapter in a book (Handbook of Applied Economic Statistics) that Aman Ullah and I edited a few years - could I help in some way?

This sort of thing comes up from time to time for all of us, I'm sure. When I got the email I was reminded of the first time that I was on the other end of such a request - in 1973, as a Ph.D. student in New Zealand, trying to get on top of what was then a newly emerging field - Bayesian Econometrics.

Arnold Zellner's classic book had appeared just two years earlier. I'd been through it from cover to cover - I still have my notes that fill in all of those gaps where there are statements such as: "Completing the square and then integrating, it can be shown that...." (Two pages of integration later....!)

Anyway, I was trying to get hold of a particular Ph.D. dissertation that Arnold had recently supervised. The University of Chicago didn't participate in the dissertations microfiche distribution service that the University of Michigan then ran, internationally. (Microfiche?? You had to be there!) So, our library couldn't help me. I was on my own.

Econometrics Beat: Dave Giles' Blog

Pages