Tuesday, June 12, 2012

Highly Cited Statistical Papers for Econometricians

There are "classic" research papers in all disciplines. As econometricians we frequently find ourselves making reference to publications by authors who are statisticians. Have you ever wondered how the statistical papers that are important to us actually "stack up" when it comes to a more general audience?

Specifically, how widely cited are these statistical  papers?
Ryan and Woodall (2005) addressed this question in their paper, "The most-cited statistical papers". They produced a list of the Top-25 papers, and it's interesting to see some very familiar names and titles on that list.

Here, with their rankings in the Top-25, are some papers that you'll recognize. I'm quoting directly from Ryan and Woodall's paper:


"(1) With 25,869 citations (currently cited 1,984 times per year),

Kaplan, E. L. & Meier, P. (1958) Nonparametric estimation from incomplete observations, Journal of the American Statistical Association, 53, pp. 457–481.

Kaplan & Meier (1958) proposed a non-parametric method for estimating the proportion of items in a population whose lifetime exceeded some specified time t from censored survival data. This type of data is very common in medical studies. This paper not only has by far the highest number of citations of all statistics papers, but it has also been ranked among the top five most cited papers for the entire field of science. 

(2) With 18,193 citations (1,342 per year),

Cox, D. R. (1972) Regression models and life tables, Journal of the Royal Statistical Society, Series B, 34, pp. 187–220.

The topic of this paper is the regression analysis of censored failure time data, which has far-reaching applications in the biomedical sciences. Cox (1972) used a semiparametric model for the hazard function, which has significant advantages over using parametric models for the failure time.

Interestingly, it is reported that a key insight into the statistical analysis method first came to Professor Cox when he was quite ill with the flu and was recalled later only with some difficulty. Cox (1986) also provided some background on the paper.

(4) With 9,504 citations (488 per year),

Marquardt, D. W. (1963) An algorithm for least squares estimation of non-linear parameters, Journal of the Society for Industrial and Applied Mathematics, 2, pp. 431–441.

The Marquardt algorithm proposed in this paper is used to estimate the parameters in a nonlinear model. 

(11) With 4,306 citations (492 per year),

Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm (C/R: pp. 22–37), Journal of the Royal Statistical Society, Series B, 39, pp. 1–22.

The Expectation Maximization (EM) algorithm is used for maximum likelihood estimation with data for which some variables are unobserved. 

(15) With 3,444 citations (280 per year),

Akaike, H. (1974) A new look at the statistical model identification, IEEE Transactions on
Automatic Control, 19, pp. 716–723.

This is a paper in which Akaike proposed a criterion for estimating the dimensionality of a model using the criterion now known as Akaike’s Information Criterion (AIC). 

(19) With 2,529 citations (120 per year),

Box, G. E. P. & Cox, D. R. (1964) An analysis of transformations, Journal of the Royal Statistical Society, Series B, 26, pp. 211–243 (discussion pp. 244–252).

DeGroot (1987) provided some interesting background on this paper from an interview with Professor Box. Box recounted, for example, that he and Cox were on a committee of the Royal Statistical Society and several people suggested that they collaborate. Their motivation and the idea of the paper sprung, to some extent, from the similarities of their family names.

Box & Cox (1964) presented a very useful family of power transformations that have typically been used to transform the dependent variable in a regression model so as to try to meet the assumptions of homoscedasticity and normality of the error terms. The right side of the model can then be transformed in the same manner so as to retrieve the quality of the fit before the dependent variable was transformed.

(24) With 2,219 citations (240 per year),

Schwarz, G. (1978) Estimating the dimension of a model, Annals of Statistics, 6, pp. 461–464.

Schwartz’s Bayesian Information Criterion (BIC), introduced in this paper, is a criterion for model selection that is often mentioned with Akaike’s AIC criterion."



Reference

Ryan, T. P. and W. H. Woodall, 2005. The most-cited statistical papers. Journal of Applied Statistics, 32, 461-474. (Free download here.)


© 2012, David E. Giles

No comments:

Post a Comment