Thursday, November 3, 2016

T. W. Anderson: 1918-2016

Unfortunately, this post deals with the recent loss of one of the great statisticians of our time - Theodore (Ted) W. Anderson.

Ted passed away on 17 September of this year, at the age of 98.

I'm hardly qualified to discuss the numerous, path-breaking, contributions that Ted made as a statistician. You can read about those in De Groot (1986), for example.

However, it would be remiss of me not to devote some space to reminding readers of this blog about the seminal contributions that Ted Anderson made to the development of econometrics as a discipline. In one of the "ET Interviews", Peter Phillips talks with Ted about his career, his research, and his role in the history of econometrics.  I commend that interview to you for a much more complete discussion than I can provide here.

(See this post for information about other ET Interviews).

Ted's path-breaking work on the estimation of simultaneous equations models, under the auspices of the Cowles Commission, was enough in itself to put him in the Econometrics Hall of Fame. He gave us the LIML estimator, and the Anderson and Rubin (1949, 1950) papers are classics of the highest order. It's been interesting to see those authors' test for over-identification being "resurrected" recently by a new generation of econometricians. 

There are all sorts of other "snippets" that one can point to as instances where Ted Anderson left his mark on the history and development of econometrics.

For instance, have you ever wondered why we have so many different tests for serial independence of regrsssion errors? Why don't we just use the uniformly most powerful (UMP) test and be done with it? Well, the reason is that no such test (against the alternative of a first-oder autoregresive pricess) exists.

That was established by Anderson (1948), and it led directly to the efforts of Durbin and Watson to develop an "approximately UMP test" for this problem.

As another example, consider the "General-to-Specific" testing methodology that we associate with David Hendry, Grayham Mizon, and other members of the (former?) LSE school of thought in econometrics. Why should we "test down", and not "test up" when developing our models? In other words, why should we start with the most  general form of the model, and then successively test and impose restrictions on the model, rather than starting with a simple model and making it increasingly complex? The short answer is that if we take the former approach, and "nest" the successive null and alternative hypotheses in the appropriate manner, then we can appeal to a theorem of Basu to ensure that the successive test statistics are independent. In turn, this means that we can control the overall significance level for the set of tests to what we want it to be. In contrast, this isn't possible if we use a "Simple-to-General" testing strategy.

All of this spelled out in Anderson (1962) in the context of polynomial regression, and is discussed further in Ted's classic time-series book (Anderson, 1971). The LSE school referred to this in promoting the "General-to-Specific" methodology.

Ted Anderson published many path-breaking papers in statistics and econometrics and he wrote several books - arguably, the two most important are Anderson (1958, 1971). He was a towering figure in the history of econometrics, and with his passing we have lost one of our founding fathers.

References

Anderson, T.W., 1948. On the theory of testing serial correlation. Skandinavisk Aktuarietidskrift, 31, 88-116.

Anderson, T.W., 1958. An Introduction to Multivariate Statistical Analysis. WIley, New York (2nd. ed. 1984).

Anderson, T.W., 1962. The choice of the degree of a polynomial regression as a multiple decision problem. Annals of Mathematical Statistics, 33, 255-265.

Anderson, T.W., 1971. The Statistical Analysis of Time Series. Wiley, New York.

Anderson, T.W. & H. Rubin, 1949. Estimation of the parameters of a single equation in a complete system of stochastic equations. Annals of Mathematical Statistics, 20, 46-63.

Anderson, T.W. & H. Rubin, 1950. The asymptotic properties of the parameters of a single equation in a complete system of stochastic equations. Annals of Mathematical Statistics, 21,570-582.

De Groot, M.H., 1986. A Conversation with T.W. Anderson: An interview with Morris De Groot. Statistical Science, 1, 97–105.

© 2016, David E. Giles

I Was Just Kidding......!

Back in 2011 I wrote a post that I titled, "Dummies for Dummies". It began with the suggestion that I'd written a book of that name, and it included this mock-up of the "cover":


(Thanks to former grad. student, Jacob Schwartz, for helping with the pic.)

Although I did say, "O.K., I'm (uncharacteristically) exaggerating just a tad", apparently a few people took me too seriously/literally. I've had a couple of requests for the book, and even one of my colleagues asked me when it would be appearing.

Sadly, no book was ever intended - somehow, I just don't think there's the market for it!

© 2016, David E. Giles

Wednesday, November 2, 2016

Specification Tests for Logit Models Using Gretl

In various earlier posts I've commented on the need for conducting specification tests when working with Logit and Probit models. (For instance, see herehere, and here.) 

One of the seminal references on this topic is Davidson and MacKinnon (1984). On my primary website, you can find a comprehensive list of other related references, together with EViews files that will enable you to conduct various specification tests with LDV  models.

The link for that material is here.

Today I had an email from Artur Tarassow at the University of Hamburg. He wrote:
I know that you're already aware of the open-source econometric software called "Gretl". 
I would like to let you know that I updated my package "LOGIT_HETERO.gfn". This package runs both the tests of homoskedasticity and correct functional form based on your nice program "Logit_hetero.prg" written for EViews.
If you want to have a look at it, simply run:
    set echo off
    set messages off
    install LOGIT_HETERO.gfn
    include LOGIT_HETERO.gfn
    open http://web.uvic.ca/~dgiles/downloads/binary_choice/Logistic_Burr.wf1
    logit Y 0 X1 X2
    matrix M = LOGIT_HETERO(Y,$xlist,$coeff,1)
    print M
Thanks for this, Artur - I'm sure it will be very helpful to many readers of this blog.

Footnote: See Artur's comment below, and the more recent post here. In particular, note Artur's comment: As a note to your blog readers: The two Logit model related packages “logit_burr.gfn” and “LOGIT_HETERO.gfn” are not available any more, as BMST includes both of them.

Reference

Davidson, R. & J. G. MacKinnon, 1984. Convenient specification tests for logit and probit models. Journal of Econometrics, 25, 241 262.

© 2016, David E. Giles

Tuesday, November 1, 2016

International Prize in Statistics


A few days ago, the inaugural winner of the biennial International Prize in Statistics was announced.

The first recipient of the new award is Sir David Cox, whose work is, of course, well known to econometricians.

The award was made to Sir David for his "Survival Analysis Model Applied in Medicine, Science, and Engineering".

Sunday, October 2, 2016

Some Suggested Reading for October

For your enjoyment:
  • Diebold, F. X. & M. Shin, 2016. Assessing point forecast accuracy by stochastic error distance. NBER Working Paper No.2516.
  • Franses, P.H., 2016. Yet another look at MIDAS regression. Econometric Institute Report 2016-32.
  • Hillier, G. & F. Martellosio, 2016. Exact properties of the maximum likelihood estimator in spatial autoregressive models. Discussion Paper DP 07/16, Department of Economics, University of Surrey.
  • Li, L., M.J. Holmes, & B.S. Lee, 2016. The asymmetric relationship between executive earnings management and compensation: A panel threshold regression approach. Applied Economics, 48, 5525-5545. 
  • Lütkepohl, H., A. Staszewska-Bystrova, & P. Winker, 2016. Calculating joint confidence bands for impulse response functions using highest density regions. MAGKS Joint Discussion Paper 16-2016.
  • Segnon, M., R. Gupta, S. Bekiros, & M.E. Wohar, 2016. Forecasting U.S. GNP growth: The role of uncertainty. Working Paper 2016-67, Department of Economics, University of Pretoria.

© 2016, David E. Giles

Friday, September 9, 2016

Spreadsheet Errors

Five years ago I wrote a post titled, "Beware of Econometricians Bearing Spreadsheets". 

The take-away message from that post was simple: there's considerable, well-documented, evidence that spreadsheets are very, very, dangerous when it comes to statistical calculations. That is, if you care about getting the right answers!

Read that post, and the associated references, and you'll see what I mean.

(You might also ask yourself, why would I pay big bucks for commercial software that is of questionable quality when I can use high-quality statistical software such as R, for free?)

This week, a piece in The Economist looks at the shocking record of publications in genomics that fall prey to spreadsheet errors. It's a sorry tale, to be sure. I strongly recommend that you take a look.

Yes, any software can be mis-used. Anyone can make a mistake. We all know that. However, it's not a good situation when a careful and well-informed researcher ends up making blunders just because the software they trust simply isn't up to snuff!  


© 2016, David E. Giles

Friday, September 2, 2016

Dummies with Standardized Data

Recently, I received the following interesting email request:
"I would like to have your assistance regarding a few questions related to regression with standardized variables and a set of dummy variables. First of all, if the variables are standardized (xi-x_bar)/sigma, can I still run the regression with a constant? And, if my dummy variables have 4 categories, do I include all of them without the constant? Or just three and keep the constant in the regression? And, how do we interpret the coefficients of the dummy variables in such as case? I mean, idoes the conventional interpretation in a single OLS regression still apply?"

Here's my (brief) email response:
"If all of the variables (including the dependent variable) have been standardized then in general there is no need to include an intercept - in fact the OLS estimate of its coefficient will be zero (as it should be).
However, if you have (say) 4 categories in the data that you want to allow for with dummy variables, then the usual results apply:
1. You can include all 4 dummies (but no intercept). The estimated coefficients on the dummies will sum to zero with standardized data. Each separate coefficient gives you the deviation from zero for the intercept in each category.
OR (equivalently)
2. You can include an intercept and any 3 of the dummies. Again, the estimated coefficients of the dummies and the intercept will sum to zero. Suppose that you include the intercept and the dummies D2, D3, and D4. The estimated coefficient of the intercept gives you the intercept effect for category 1. The estimated coefficient for D2 gives you the deviation of the intercept for category 2, from that for category 1, etc."
You can easily verify this by fitting a few OLS regressions, and there's a lot more about regression analysis with standardized data in this earlier post of mine.


© 2016, David E. Giles

Wednesday, August 31, 2016

September Reading

Here are a few suggestions for some interesting reading this month:
© 2016, David E. Giles

Tuesday, July 26, 2016

The Forecasting Performance of Models for Cointegrated Data

Here's an interesting practical question that arises when you're considering different forms of econometric models for forecasting time-series data:
"Which type of model will perform best when the data are non-stationary, and perhaps cointegrated?"
To answer this question we have to think about the alternative models that are available to us; and we also have to decide on what we mean by 'best'. In other words, we have to agree on some sort of loss function or performance criterion for measuring forecast quality.

Notice that the question I've posed above allows for the possibility that the data that we're using are integrated, and the various series we're working with may or may not be cointegrated. This scenario covers a wide range of commonly encountered situations in econometrics.

In an earlier post I discussed some of the basic "mechanics" of forecasting from an Error Correction Model. This type of model is used in the case where our data are non-stationary and cointegrated, and we want to focus on the short-run dynamics of the relationship that we're modelling. However, in that post I deliberately didn't take up the issue of whether or not such a model will out-perform other competing models when it comes to forecasting.

Let's look at that issue here.