Econometrics Beat: Dave Giles' Blog: Multicollinearity

Showing posts with label Multicollinearity. Show all posts

Monday, April 1, 2019

Some April Reading for Econometricians

Here are my suggestions for this month:

Hyndman, R. J., 2019. A brief history of forecasting competitions. Working Paper 03/19, Department of Econometrics and Business Statistics, Monash University.
Kuffner, T. A. & S. G. Walker, 2019. Why are p-values controversial?. American Statistician, 73, 1-3.
Sargan, J. D.,, 1958. The estimation of economic relationships using instrumental variables. Econometrica, 26, 393-415. (Read for free online.)
Sokal, A. D., 1996. Transgressing the boundaries: Towards a trasnformative hermeneutics of quantum gravity. Social Text, 46/47, 217-252.
Zeng, G. & Zeng, E., 2019. On the relationship between multicollinearity and separation in logistic regression. Communications in Statistics - Simulation and Computation, published online.
Zhang, X., S. Paul, & Y-G. Yang, 2019. Small sample bias correction or bias reduction? Communications in Statistics - Simulation and Computation, published online.

Tuesday, January 1, 2019

New Year Reading Suggestions for 2019

With a new year upon us, it's time to keep up with new developments -

Basu, D., 2018. Can we determine the direction of omitted variable bias of OLS estimators? Working Paper 2018-16, Department of Economics, University of Massachusetts, Amherst.
Jiang, B., Y. Lu, & J. Y. Park, 2018. Testing for stationarity at high frequency. Working Paper 2018-9, Department of Economics, University of Sydney.
Psaradakis, Z. & M. Vavra, 2018. Normality tests for dependent data: Large-sample and bootstrap approaches. Communications in Statistics - Simulation and Computation, online.
Spanos, A., 2018. Near-collinearity in linear regression revisited: The numerical vs. the statistical perspective. Communications in Statistics - Theory and Methods, online.
Thorsrud, L. A., 2018. Words are the new numbers: A newsy coincident index of the business cycle. Journal of Business Economics and Statistics, online. (Working Paper version.)
Zhang, J., 2018. The mean relative entropy: An invariant measure of estimation error. American Statistician, online.

Tuesday, January 2, 2018

Econometrics Reading for the New Year

Another year, and lots of exciting reading!

Davidson, R. & V. Zinde-Walsh, 2017. Advances in specification testing. Canadian Journal of Economics, online.
Dias, G. F. & G. Kapetanios, 2018. Estimation and forecasting in vector autoregressive moving average models for rich datasets. Journal of Econometrics, 202, 75-91.
González-Estrada, E. & J. A. Villaseñor, 2017. An R package for testing goodness of fit: goft. Journal of Statistical Computation and Simulation, 88, 726-751.
Hajria, R. B., S. Khardani, & H. Raïssi, 2017. Testing the lag length of vector autoregressive models: A power comparison between portmanteau and Lagrange multiplier tests. Working Paper 2017-03, Escuela de Negocios y EconomÍa. Pontificia Universidad Católica de ValaparaÍso.
McNown, R., C. Y. Sam, & S. K. Goh, 2018. Bootstrapping the autoregressive distributed lag test for cointegration. Applied Economics, 50, 1509-1521.
Pesaran, M. H. & R. P. Smith, 2017. Posterior means and precisions of the coefficients in linear models with highly collinear regressors. Working Paper BCAM 1707, Birkbeck, University of London.
Yavuz, F. V. & M. D. Ward, 2017. Fostering undergraduate data science. American Statistician, online.

Sunday, November 5, 2017

Econometrics Reading List for November

Some suggestions........

Garcia, J. and D. E. Ramirez, 2017. The successive raising estimator and its relation with the ridge estimator. Communications in Statistics - Theory and Methods, 46, 11123-11142.
Silva, I. R., 2017. On the correspondence between frequentist and Bayesian tests. Communications in Statistics - Theory and Methods, online.
Steel, M. F. J., 2017. Model averaging and its use in economics. MPRA Paper No. 81568.
Teräsvirta, T., 2017. Nonlinear models in macroeconometrics. CREATES Research Paper 2017-32.
Witmer, J., 2017. Bayes and MCMC for undergraduates. American Statistician, 71, 259-274.
Zimmerman, C., 2015. On the need for a replication journal. Federal Reserve Bank of St. Louis, Working Paper 2015-016A.

Tuesday, August 4, 2015

August reading

Here's my (slightly delayed) August reading list:

Ahelegbey, A. F., 2015. The econometrics of networks: A review. Working Paper 2015/13, Department of Economics, University of Venice.
Clemens, M. A., 2015. The meaning of failed replications: A review and proposal. IZA Discussion Paper No.9000.
Fair, R. C., 2015. Information limits of aggregate data. Discussion Paper No. 2011, Cowles Foundation, Yale University.
Phillips, P. C. B., 2015. Inference in near singular regression. Discussion Paper No. 2009, Cowles Foundation, Yale University.
Stock, J. H. and M. W. Watson, 2015. Core inflation and trend inflation. NBER Working Paper 21282.
Ullah, A. and X. Zhang, 2015. Grouped model averaging for finite sample size. Working paper, Department of Economics, University of California, Riverside.

Thursday, December 11, 2014

Two Non-Problems!

I just love Dick Startz's "byline" on the EViews 9 Beta Forum:

"Non-normality and collinearity are NOT problems!"

Why do I like it so much? Regarding "normality", see here, and here. As for "collinearity": see here, here, here, and here

Friday, September 19, 2014

Least Squares, Perfect Multicollinearity, & Estimable Functions

This post is essentially an extension of another recent post on this blog. I'll assume that you've read that post, where I discussed the problem of solving linear equations of the form Ax = y, when the matrix A is singular.

Let's look at how this problem might arise in the context of estimating the coefficients of a linear regression model, y = Xβ + ε. In the previous post, I said:

"Least squares estimation leads to the so-called "normal equations":

X'Xb = X'y . (1)

If the regressor matrix, X, has k columns, then (1) is a set of k linear equations in the k unknown elements of β. You'll recall that if X has full column rank, k, then (X'X) also has full rank, k, and so (X'X)^-1is well-defined. We then pre-multiply each side of (1) by (X'X)^-1, yielding the familiar least squares estimator for β, namely b = (X'X)^-1X'y.

So, as long as we don't have "perfect multicollinearity" among the regressors (the columns of X), we can solve (1), and the least squares estimator is defined. More specifically, a unique estimator for each individual element of β is defined.

What if there is perfect multicollinearity, so that the rank of X, and of (X'X), is less than k? In that case, we can't compute (X'X)^-1, we can't solve the normal equations in the usual way, and we can't get a unique estimator for the (full) β vector."

I promised that I'd come back to the statement, "we can't get a unique estimator for the (full) β vector". Now's the time to do that.

"Inverting" Singular Matrices

You can only invert a matrix if that matrix is non-singular. Right? Actually, that's wrong.

You see, there are various sorts of inverse matrices, and most of them apply to the situation where the original matrix is singular.

Before elaborating on this, notice that this fact may be interesting in the context of estimating the coefficients of a linear regression model, y = Xβ + ε. Least squares estimation leads to the so-called "normal equations":

X'Xb = X'y . (1)

If the regressor matrix, X, has k columns, then (1) is a set of k linear equations in the k unknown elements of β. You'll recall that if X has full column rank, k, then (X'X) also has full rank, k, and so (X'X)^-1is well-defined. We then pre-multiply each side of (1) by (X'X)^-1, yielding the familiar least squares estimator for β, namely b = (X'X)^-1X'y.

So, as long as we don't have "perfect multicollinearity" among the regressors (the columns of X), we can solve (1), and the least squares estimator is defined. More specifically, a unique estimator for each individual element of β is defined.

What if there is perfect multicollinearity, so that the rank of X, and of (X'X), is less than k? In that case, we can't compute (X'X)^-1, we can't solve the normal equations in the usual way, and we can't get a unique estimator for the (full) β vector.

Let's look carefully at the last sentence above. There are two parts of it that bear closer scrutiny:

Reading for the New Year

Back to work, and back to reading:

Basturk, N., C. Cakmakli, S. P. Ceyhan, and H. K. van Dijk, 2013. Historical developments in Bayesian econometrics after Cowles Foundation monographs 10,14. Discussion Paper 13-191/III, Tinbergen Institute.
Bedrick, E. J., 2013. Two useful reformulations of the hazard ratio. American Statistician, in press.
Nawata, K. and M. McAleer, 2013. The maximum number of parameters for the Hausman test when the estimators are from different sets of equations. Discussion Paper 13-197/III, Tinbergen Institute.
Shahbaz, M, S. Nasreen, C. H. Ling, and R. Sbia, 2013. Causality between trade openness and energy consumption: What causes what high, middle and low income countries. MPRA Paper No. 50832.
Tibshirani, R., 2011. Regression shrinkage and selection via the lasso: A retrospective. Journal of the Royal Statistical Society, B, 73, 273-282.
Zamani, H. and N. Ismail, 2014. Functional form for the zero-inflated generalized Poisson regression model. Communications in Statistics - Theory and Methods, in press.

Thursday, December 12, 2013

When Everything Old is New Again

We see it with clothing styles. Not just hemline lengths, but also the widths of jacket lapels and guy's ties. How wide should the trouser legs be? Cuffs or no cuffs? Leave your clothes in the closet long enough, and there's a good chance they'll be back in style some day!

And so it is with econometrics. Here are just a few examples:

Unbiased Model Selection Using the Adjusted R-Squared

The coefficient of determination (R²), and its "adjusted" counterpart, really don't impress me much! I often tell students that this statistic is one of the last things I look at when appraising the results of estimating a regression model.

Previously, I've had a few things to say about this measure of goodness-of-fit (e.g., here and here). In this post I want to say something positive, for once, about "adjusted" R². Specifically, I'm going to talk about its use as a model-selection criterion.

Can You Actually TEST for Multicollinearity?

When you're undertaking a piece of applied econometrics, something that's always on your mind is the need to test the specification of your model, and to test the validity of the various underlying assumptions that you're making. At least - I hope it's always on your mind!

This is an important aspect of any modelling exercise, whether you're working with a linear regression model, or with some nonlinear model such Logit, Probit, Poisson regression, etc. Most people are pretty good when it comes to such testing in the context of the linear regression model. They seem to be more lax once they move away from that framework. That makes me grumpy, but that's not what this particular post is about.

It's actually about a rather silly question that you sometimes encounter, namely: "Have you tested to see if multicollinearity is a problem for your results?"

I'll explain why this isn't really a sensible question, and why the answer to the question in the title for this post is a resounding "No!"

Good Old R-Squared!

My students are often horrified when I tell them, truthfully, that one of the last pieces of information that I look at when evaluating the results of an OLS regression, is the coefficient of determination (R²), or its "adjusted" counterpart. Fortunately, it doesn't take long to change their perspective!

After all, we all know that with time-series data, it's really easy to get a "high" R² value, because of the trend components in the data. With cross-section data, really low R² values are really common. For most of us, the signs, magnitudes, and significance of the estimated parameters are of primary interest. Then we worry about testing the assumptions underlying our analysis. R² is at the bottom of the list of priorities.

ARDL Models - Part I

I've been promising, for far too long, to provide a post on ARDL models and bounds testing. Well, I've finally got around to it!

"ARDL" stands for "Autoregressive-Distributed Lag". Regression models of this type have been in use for decades, but in more recent times they have been shown to provide a very valuable vehicle for testing for the presence of long-run relationships between economic time-series.

I'm going to break my discussion of ARDL models into two parts. Here, I'm going to describe, very briefly, what we mean by an ARDL model. This will then provide the background for a second post that will discuss and illustrate how such models can be used to test for cointegration, and estimate long-run and short-run dynamics, even when the variables in question may include a mixture of stationary and non-stationary time-series.

The Rise & Fall of Multicollinearity

Boris Kaiser in the Public Economics Working Group of the Department of Economics, University of Berne in Switzerland writes:

"As a frequent reader of your blog, I consider it my honour as well as my duty to point your attention to the following graph:

It shows the relative frequency of appearance of the word in the realm of the literature, contained in Google Books, over the last 50 years. [1960-2011; DG] Clicking on this link here, you can see how I generated the graph."

It seems that we're well on the way to the eradication of this grossly over-rated concept, as predicted in my earlier post, "The Second Longest Word in the Econometrics Dictionary". Thank goodness for that!

I'll explain my relief in a subsequent post. Meantime, "thanks a bunch for doing your duty, Boris"!

Thursday, September 15, 2011

Micronumerosity

In a much earlier post I took a jab at the excessive attention paid to the concept of "multicollinearity", historically, in econometrics text books.

Art Goldberger (1930-2009) made numerous important contributions to econometrics, and modelling in the social sciences in general. He wrote several great texts, the earliest of which (Goldberger, 1964) was one of the very first to use the matrix notation that we now take as standard for the linear regression model.

In one of his text books, Art also poked fun at the attention given to multicollinearity, and I'm going to share his parody with you here in full. In a couple of places I've had to replace formulae with words. What follows is from Chapter 23.3. of Goldberger (1991):

The Second-Longest Word in the Econometrics Dictionary

The other day I paid a surprise visit to my University's library - it was an honest-to-goodness physical visit that involved putting one foot in front of the other, and straining those remaining neurons that deal with my long-term memory - not one of the virtual on-line visits that now pass for the real thing. Back in the day, when all the world and most of you were young and beautiful, an occasional visit to the library was actually quite therapeutic - a nice break from all of those interruptions in the office. It was great to be back, even though I had to focus hard to avoid tripping over old rabbit burrows while en route, and I was somewhat confused by the fact that my favourite large red book had apparently been moved from the very end of the 3rd shelf in the 4th row from the left on Level 2, sometime since September of 2002.

So, what drove me out into the bitter wasteland of Gordon Head in the dead of March? Well, I wanted to take a look at a dictionary that (shame on me) I thought I didn't own. Did you know that there really is A Dictionary of Econometrics (Darnell, 1994)? It's a very nice volume, and - for the record - the longest word in this dictionary is "heteroskedasticity", with 18 letters. Yes, this word should indeed be spelled with a "k", and not another "c" (see McCulloch, 1985). The second-longest word in that dictionary, with 17 letters is - you guessed it - "multicollinearity". Quite a mouthful, I know, but if you've read this far then I assume that I don't need to bore you by explaining what this blockbuster of a word means.

Econometrics Beat: Dave Giles' Blog

Pages