Econometrics Beat: Dave Giles' Blog: Specification testing

Showing posts with label Specification testing. Show all posts

Sunday, September 1, 2019

Back to School Reading

Here we are - it's Labo(u)r Day weekend already in North America, and we all know what that means! It's back to school time.

You'll need a reading list, so here are some suggestions:

Frances, Ph. H. B. F., 2019. Professional forecasters and January. Econometric Institute Research Papers EI2019-25, Erasmus University Rotterdam.
Harvey, A. & R. Ito, 2019. Modeling time series when some observations are zero. Journal of Econometrics, in press.
Leamer, E. E., 1978. Specification Searches: Ad Hoc Inference With Nonexperimental Data. Wiley, New York. (This is a legitimate free download.)
MacKinnon, J. G., 2019. How cluster-robust inference is changing applied econometrics. Working Paper 1413, Economics Department, Queen's University.
Steel, M. F. J., 2019. Model averaging and its use in economics. Mimeo., Department of Statistics, University of Warwick.
Stigler, S. M., 1981. Gauss and the invention of least squares. Annals of Statistics, 9, 465-474.

Monday, July 1, 2019

This month my reading list is a bit different from the usual one. I've taken a look back at past issues of Econometrica and Journal of Econometrics, and selected some important and interesting papers that happened to be published in July issues of those journals.

Here's what I came up with for you:

Aigner, D., C. A. K. Lovell, & P. Schmidt, 1977. Formulation and estimation of stochastic frontier production function models. Journal of Econometrics, 6, 21-37.
Chow, G. C., 1960. Tests of equality between sets of coefficients in two linear regressions. Econometrica, 28, 591-605.
Davidson, R. & J. G. MacKinnon, 1984. Convenient specification tests for logit and probit models. Journal of Econometrics, 25, 241-262.
Dickey, D. A. & W. A. Fuller, 1981. Likelihood ratio statistics for autoregressive time series with a unit root. Econometrica, 49, 1057-1072.
Granger, C. W. J. & P. Newbold, 1974. Spurious regressions in econometrics. Journal of Econometrics, 2, 111-120.
Sargan, J. D., 1961. The maximum likelihood estimation of economic relationships with autoregressive residuals. Econometrica, 29, 414-426.

Friday, June 7, 2019

Clive Granger Special Issue

The recently published Volume 10, No. 1 issue of the European Journal of Pure and Applied Mathematics takes the form of a memorial issue for Clive Granger. You can find the Table of Contents here, and all of the articles can be downloaded freely.

This memorial issue is co-edited by Jennifer Castle and David Hendry. The contributed papers include ones that deal with Forecasting, Cointegration, Nonlinear Time Series, and Model Selection.

This is a fantastic collection of important survey-type papers that simply must read!

Wednesday, May 1, 2019

May Reading List

Here's a selection of suggested reading for this month:

Athey, S. & G. W. Imbens, 2019. Machine learning methods economists should know about. Mimeo.
Bhagwat, P. & E. Marchand, 2019. On a proper Bayes but inadmissible estimator. American Statistician, online.
Canals, C. & A. Canals, 2019. When is n large enough? Looking for the right sample size to estimate proportions. Journal of Statistical Computation and Simulation, 89, 1887-1898.
Cavaliere, G. & A. Rahbek, 2019. A primer on bootstrap testing of hypotheses in time series models: With an application to double autoregressive models. Discussion Paper 19-03, Department of Economics, University of Copenhagen.
Chudik, A. & G. Geogiardis, 2019. Estimation of impulse response functions when shocks are observed at a higher frequency than outcome variables. Globalization Institute Working Paper 356, Federal Reserve Bank of Dallas.
Reschenhofer, E., 2019. Heteroscedasticity-robust estimation of autocorrelation. Communications in Statistics - Simulation and Computation, 48, 1251-1263.

Sunday, February 3, 2019

February Reading

Now that Groundhog Day is behind us, perhaps we can focus on catching up on our reading?

Deboulets, L. D. D., 2018. A review on variable selection in regression. Econometrics, 6(4), 45.
Efron, B. & C. Morris, 1977. Stein's paradox in statistics. Scientific American, 236(5), 119-127.
Khan, W. M. & A. u I. Khan, 2018. Most stringent test of independence for time series. Communications in Statistics - Simulation and Computation, online.
Pedroni, P., 2018. Panel cointegration techniques and open challenges. Forthcoming in Panel Data Econometrics, Vol. 1: Theory, Elsevier.
Steel, M. F., J., 2018. Model averaging and its use in economics. MPRA Paper No. 90110.
Tay, A. S. & K. F. Wallis, 2000. Density forecasting: A survey. Journal of Forecasting, 19, 235-254.

Tuesday, January 1, 2019

New Year Reading Suggestions for 2019

With a new year upon us, it's time to keep up with new developments -

Basu, D., 2018. Can we determine the direction of omitted variable bias of OLS estimators? Working Paper 2018-16, Department of Economics, University of Massachusetts, Amherst.
Jiang, B., Y. Lu, & J. Y. Park, 2018. Testing for stationarity at high frequency. Working Paper 2018-9, Department of Economics, University of Sydney.
Psaradakis, Z. & M. Vavra, 2018. Normality tests for dependent data: Large-sample and bootstrap approaches. Communications in Statistics - Simulation and Computation, online.
Spanos, A., 2018. Near-collinearity in linear regression revisited: The numerical vs. the statistical perspective. Communications in Statistics - Theory and Methods, online.
Thorsrud, L. A., 2018. Words are the new numbers: A newsy coincident index of the business cycle. Journal of Business Economics and Statistics, online. (Working Paper version.)
Zhang, J., 2018. The mean relative entropy: An invariant measure of estimation error. American Statistician, online.

Monday, November 5, 2018

Econometrics Reading for November

In between raking leaves and dealing with some early snow, I've put together this list of suggested reading for you:

Beckert, W., 2018. A note on specification testing in some structural regression models. Mimeo., Department of Economics, Mathematics and Statistics, Birkbeck College, University of London.
Clarke, D., 2018. A convenient omitted bias formula for treatment effect models. Economics Letters, in press.
Liu, Y. & Y. Rho, 2018. On the choice of instruments in mixed frequency specification tests. Mimeo., School of Business and Economics, Michigan Technological University.
Lütkepohl, H., A. Staszewska-Bystrova, & P. Winker, 2018. Constructing joint confidence bands for impulse functions of VAR models - A review. Lodz Economic Working Paper 4/2018, Faculty of Economics and Sociology, University of Lodz.
Richardson, A., T. van Florenstein Mulder, & T. Vehbi, 2018. Nowcasting New Zealand GDP using machine learning algorithms.
Słoczyński, T., 2018. A general weighted average representation of the ordinary and two-stage least squares estimands. Mimeo., Department of Economics, Brandeis University.

Friday, June 1, 2018

Sunday, February 11, 2018

Tuesday, January 2, 2018

Econometrics Reading for the New Year

Another year, and lots of exciting reading!

Davidson, R. & V. Zinde-Walsh, 2017. Advances in specification testing. Canadian Journal of Economics, online.
Dias, G. F. & G. Kapetanios, 2018. Estimation and forecasting in vector autoregressive moving average models for rich datasets. Journal of Econometrics, 202, 75-91.
González-Estrada, E. & J. A. Villaseñor, 2017. An R package for testing goodness of fit: goft. Journal of Statistical Computation and Simulation, 88, 726-751.
Hajria, R. B., S. Khardani, & H. Raïssi, 2017. Testing the lag length of vector autoregressive models: A power comparison between portmanteau and Lagrange multiplier tests. Working Paper 2017-03, Escuela de Negocios y EconomÍa. Pontificia Universidad Católica de ValaparaÍso.
McNown, R., C. Y. Sam, & S. K. Goh, 2018. Bootstrapping the autoregressive distributed lag test for cointegration. Applied Economics, 50, 1509-1521.
Pesaran, M. H. & R. P. Smith, 2017. Posterior means and precisions of the coefficients in linear models with highly collinear regressors. Working Paper BCAM 1707, Birkbeck, University of London.
Yavuz, F. V. & M. D. Ward, 2017. Fostering undergraduate data science. American Statistician, online.

Friday, September 22, 2017

Misclassification in Binary Choice Models

Several years ago I wrote a number of posts about Logit and Probit models, and the Linear Probability Model LPM). One of those posts (also, see here) dealt with the problems that arise if you mis-classify the dependent variable in such models. That is, in the binary case, if some of your "zeroes" should be "ones", and/or vice versa.

In a conventional linear regression model, measurement errors in the dependent variable are not a biog deal. However, the situation is quite different with Logit, Probit, and the LPM.

This issue is taken up in detail in an excellent, recent, paper by Meyer and Mittag (2017), and I commend their paper to you.

To give you an indication of what those authors have to say, this is from their Introduction:

".....the literature has established that misclassification is pervasive and affects estimates, but not how it affects them or what can still be done with contaminated data. This paper characterizes the consequences of misclassification of the dependent variable in binary choice models and assesses whether substantive conclusions can still be drawn from the observed data and if so, which methods to do so work well. We first present a closed form solution for the bias in the linear probability model that allows for simple corrections. For non-linear binary choice models such as the Probit model, we decompose the asymptotic bias into four components. We derive closed form expressions for three bias components and an equation that determines the fourth component. The formulas imply that if misclassification is conditionally random, only the probabilities of misclassification are required to obtain the exact bias in the linear probability model and an approximation in the Probit model. If misclassification is related to the covariates, additional information on this relation is required to assess the (asymptotic) bias, but the results still imply a tendency for the bias to be in the opposite direction of the sign of the coefficient."

This paper includes a wealth of information, including some practical guidelines for practitioners.

Reference

Meyer, B. D. and N. Mittag, 2017. Misclassification in binary choice models. Journal of Econometrics, 200, 295-311.

Saturday, July 1, 2017

Canada Day Reading List

I was tempted to offer you a list of 150 items, but I thought better of it!

Hamilton, J. D., 2017. Why you should never use the Hodrick-Prescott filter. Mimeo., Department of Economics, UC San Diego.

Jin, H. and S. Zhang, 2017. Spurious regression between long memory series due to mis-specified structural breaks. Communications in Statistics - Simulation and Computation, in press.

Kiviet, J. F., 2016. Testing the impossible: Identifying exclusion restrictions.Discussion Paper 2016/03, Amsterdam School of Economics, University of Economics.

Lenz, G. and A. Sahn, 2017. Achieving statistical significance with covariates. BITSS Preprint (H/T Arthur Charpentier)

Sephton, P., 2017. Finite sample critical values of the generalized KPSS test. Computational Economics, 50, 161-172.

Stypka, O., P. Grabarczyk, R. Kawka, and M. Wagner, 2017. "Linear" fully modified OLS estimation of cointegrating polynomial regressions. Discussion Paper Nr. 77/2016, SFB 823. (H/T David Stern)

Friday, February 3, 2017

February Reading

Here are some suggestions for your reading list this month:

Aastveit, A., C. Foroni, and F. Ravazzolo, 2016. Density forecasts with midas models. Journal of Applied Econometrics, online.
Chang, C-L. and M. McAleer, 2016. The fiction of full BEKK. Tinbergen Institute Discussion Paper TI 2017-015/III.
Chudik, A., G. Kapetanios, and M.H. Pesaran, 2016. A one-covariate at a time, multiple testing approach to variable selection in high-dimensional linear regression models. Cambridge Working Paper Economics: 1667.
Kleiber, C.. Structural change in (economic) time series WWZ Working Paper 2016/06, University of Basel.
Romano, J. P. and M. Wolf, 2017. Resurrecting weighted least squares. Journal of Econometrics, 197, 1-19.
Yamada, H., 2017. Several least squares problems related to the Hodrick-Prescott filtering. Communications in Statistics - Theory and Methods, online.

Friday, January 13, 2017

Vintage Years in Econometrics - The 1970's

Continuing on from my earlier posts about vintage years for econometrics in the 1930's, 1940's, 1950's, 1960's, here's my tasting guide for the 1970's.

Once again, let me note that "in econometrics, what constitutes quality and importance is partly a matter of taste - just like wine! So, not all of you will agree with the choices I've made in the following compilation."

Explaining the Almon Distributed Lag Model

In an earlier pos t I discussed Shirley Almon's contribution to the estimation of Distributed Lag (DL) models, with her seminal paper in 1965.

That post drew quite a number of email requests for more information about the Almon estimator, and how it fits into the overall scheme of things. In addition, Almon's approach to modelling distributed lags has been used very effectively more recently in the estimation of the so-called MIDAS model. The MIDAS model (developed by Eric Ghysels and his colleagues - e.g., see Ghysels et al., 2004) is designed to handle regression analysis using data with different observation frequencies. The acronym, "MIDAS", stands for "Mixed-Data Sampling". The MIDAS model can be implemented in R, for instance (e.g., see here), as well as in EViews. (I discussed this in this earlier post.)

For these reasons I thought I'd put together this follow-up post by way of an introduction to the Almon DL model, and some of the advantages and pitfalls associated with using it.

Let's take a look.

New Year's Reading

New Year's resolution - read more Econometrics!

Bürgi, C., 2016. What do we lose when we average expectations? RPF Working Paper No. 2016-013, Department of Economics, George Washington University.
Cox, D.R., 2016. Some pioneers of modern statistical theory:A personal reflection. Biometrika, 103, 747-759
Golden, R.M., S.S. Henley, H. White, & T.M. Kashner, 2016. Generalized information matrix tests for detecting model misspecification. Econometrics, 4, 46; doi:10.3390/econometrics4040046.
Phillips, G.D.A. & Y. Xu, 2016. Almost unbiased variance estimation in simultaneous equations models. Working Paper No. E2016/10, Cardiff Business School, University of Cardiff.
Siliverstovs, B., 2016. Short-term forecasting with mixed-frequency data: A MIDASSO approach. Applied Economics, 49, 1326-1343.
Vosseler, A. & E. Weber, 2016. Bayesian analysis of periodic unit roots in the presence of a break. Applied Economics, online.

Best wishes for 2017, and thanks for supporitng this blog!

© 2016, David E. Giles

Monday, December 26, 2016

Specification Testing With Very Large Samples

I received the following email query a while back:

"It's my understanding that in the event that you have a large sample size (in my case, > 2million obs) many tests for functional form mis-specification will report statistically significant results purely on the basis that the sample size is large. In this situation, how can one reasonably test for misspecification?"

Well, to begin with, that's absolutely correct - if the sample size is very, very large then almost any null hypothesis will be rejected (at conventional significance levels). For instance, see this earlier post of mine.

Schmueli (2012) also addresses this point from the p-value perspective.

But the question was, what can we do in this situation if we want to test for functional form mis-specification?

Schmueli offers some general suggestions that could be applied to this specific question:

Present effect sizes.
Report confidence intervals.
Use (certain types of) charts

This is followed with an empirical example relating toauction prices for camera sales on eBay, using a sample size of n = 341,136.

To this, I'd add, consider alternative functional forms and use ex post forecast performance and cross-validation to choose a preferred functional form for your model.

You don't always have to use conventional hypothesis testing for this purpose.

Reference

Schmueli, G., 2012. Too big to fail: Large samples and the p-value problem. Mimeo., Institute of Service Science, National Tsing Hua University, Taiwan.

Saturday, December 24, 2016

Sunday, November 6, 2016

The BMST Package for Gretl

As a follow-up to this recent post, I heard again from Artur Tarassow.

You'll see from his email message below that he's extended his earlier work and has prepared a new package for Gretl called "Binary Models Specification Tests".

It's really good to see tests of this type being made available for users of different software - especially free software such as Gretl.

Artur writes:

T. W. Anderson: 1918-2016

Unfortunately, this post deals with the recent loss of one of the great statisticians of our time - Theodore (Ted) W. Anderson.

Ted passed away on 17 September of this year, at the age of 98.

I'm hardly qualified to discuss the numerous, path-breaking, contributions that Ted made as a statistician. You can read about those in De Groot (1986), for example.

However, it would be remiss of me not to devote some space to reminding readers of this blog about the seminal contributions that Ted Anderson made to the development of econometrics as a discipline. In one of the "ET Interviews", Peter Phillips talks with Ted about his career, his research, and his role in the history of econometrics. I commend that interview to you for a much more complete discussion than I can provide here.

(See this post for information about other ET Interviews).

Ted's path-breaking work on the estimation of simultaneous equations models, under the auspices of the Cowles Commission, was enough in itself to put him in the Econometrics Hall of Fame. He gave us the LIML estimator, and the Anderson and Rubin (1949, 1950) papers are classics of the highest order. It's been interesting to see those authors' test for over-identification being "resurrected" recently by a new generation of econometricians.

There are all sorts of other "snippets" that one can point to as instances where Ted Anderson left his mark on the history and development of econometrics.

For instance, have you ever wondered why we have so many different tests for serial independence of regrsssion errors? Why don't we just use the uniformly most powerful (UMP) test and be done with it? Well, the reason is that no such test (against the alternative of a first-oder autoregresive pricess) exists.

That was established by Anderson (1948), and it led directly to the efforts of Durbin and Watson to develop an "approximately UMP test" for this problem.

As another example, consider the "General-to-Specific" testing methodology that we associate with David Hendry, Grayham Mizon, and other members of the (former?) LSE school of thought in econometrics. Why should we "test down", and not "test up" when developing our models? In other words, why should we start with the most general form of the model, and then successively test and impose restrictions on the model, rather than starting with a simple model and making it increasingly complex? The short answer is that if we take the former approach, and "nest" the successive null and alternative hypotheses in the appropriate manner, then we can appeal to a theorem of Basu to ensure that the successive test statistics are independent. In turn, this means that we can control the overall significance level for the set of tests to what we want it to be. In contrast, this isn't possible if we use a "Simple-to-General" testing strategy.

All of this spelled out in Anderson (1962) in the context of polynomial regression, and is discussed further in Ted's classic time-series book (Anderson, 1971). The LSE school referred to this in promoting the "General-to-Specific" methodology.

Ted Anderson published many path-breaking papers in statistics and econometrics and he wrote several books - arguably, the two most important are Anderson (1958, 1971). He was a towering figure in the history of econometrics, and with his passing we have lost one of our founding fathers.

References

Anderson, T.W., 1948. On the theory of testing serial correlation. Skandinavisk Aktuarietidskrift, 31, 88-116.

Anderson, T.W., 1958. An Introduction to Multivariate Statistical Analysis. WIley, New York (2nd. ed. 1984).

Anderson, T.W., 1962. The choice of the degree of a polynomial regression as a multiple decision problem. Annals of Mathematical Statistics, 33, 255-265.

Anderson, T.W., 1971. The Statistical Analysis of Time Series. Wiley, New York.

Anderson, T.W. & H. Rubin, 1949. Estimation of the parameters of a single equation in a complete system of stochastic equations. Annals of Mathematical Statistics, 20, 46-63.

Anderson, T.W. & H. Rubin, 1950. The asymptotic properties of the parameters of a single equation in a complete system of stochastic equations. Annals of Mathematical Statistics, 21,570-582.

De Groot, M.H., 1986. A Conversation with T.W. Anderson: An interview with Morris De Groot. Statistical Science, 1, 97–105.

Pages