Thursday, October 31, 2019

It's Time to Go

When I released my first post on the blog on 20th. Febuary 2011 I really wasn't sure what to expect! After all, I was aiming to reach a somewhat niche audience.

Well, 949 posts and 7.4 million page-hits later, this blog has greatly exceeded my wildest expectations. 

However, I'm now retired and I turned 70 three months ago. I've decided to call it quits, and this is my final post.

I'd rather make a definite decision about this than have the blog just fizzle into nothingness.

For now, the Econometrics Beat blog will remain visible, but it will be closed for further comments and questions.

I've had a lot fun and learned a great deal through this blog. I owe a debt of gratitude to all of you who've followed my posts, made suggestions, asked questions, made helpful comments, and drawn errors to my attention.

I just hope that it's been as positive an experience for you as it has been for me.

Thank you - and enjoy your Econometrics!

© 2019, David E. Giles

Wednesday, October 30, 2019

Everything's Significant When You Have Lots of Data

Well........, not really!

It might seem that way on the face of it, but that's because you're probably using a totally inappropriate measure of what's (statistically) significant, and what's not.

I talked a bit about this issue in a previous post, where I said:
"Granger (1998, 2003) has reminded us that if the sample size is sufficiently large, then it's virtually impossible not to reject almost any hypothesis. So, if the sample is very large and the p-values associated with the estimated coefficients in a regression model are of the order of, say, 0.10 or even 0.05, then this really bad news. Much, much, smaller p-values are needed before we get all excited about 'statistically significant' results when the sample size is in the thousands, or even bigger."
This general point, namely that our chosen significance level should be decreased as the sample size grows, is pretty well understood by most statisticians and econometricians. (For example, see Good, 1982.) However, it's usually ignored by the authors of empirical economics studies based on samples of thousands (or more) observations. Moreover, a lot of practitioners seem to be unsure of just how much they should revise their significance levels (or re-interpret their p-values) in such circumstances.

There's really no excuse for this, because there are some well-established guidelines to help us. In fact, as we'll see, some of them have been around since at least the 1970's.

Let's take a quick look at this, because it's something that all students need to be made aware of as we work more and more with "big data". Students certainly won't gain this awareness by looking at the  interpretation of the results in the vast majority of empirical economics papers that use even sort-of-large samples!

Sunday, October 27, 2019

Reporting an R-Squared Measure for Count Data Models

This post was prompted by an email query that I received some time ago from a reader of this blog. I thought that a more "expansive" response might be of interest to other readers............

In spite of its many limitations, it's standard practice to include the value of the coefficient of determination (R2) - or its "adjusted" counterpart - when reporting the results of a least squares regression. Personally, I think that R2 is one of the least important statistics to include in our results, but we all do it. (See this previous post.)

If the regression model in question is linear (in the parameters) and includes an intercept, and if the parameters are estimated by Ordinary Least Squares (OLS), then R2 has a number of well-known properties. These include:
  1. 0 ≤ R2 ≤ 1.
  2. The value of R2 cannot decrease if we add regressors to the model.
  3. The value of R2 is the same, whether we define this measure as the ratio of the "explained sum of squares" to the "total sum of squares" (RE2); or as one minus the ratio of the "residual sum of squares" to the "total sum of squares" (RR2).
  4. There is a correspondence between R2 and a significance test on all slope parameters; and there is a correspondence between changes in (the adjusted) R2 as regressors are added, and significance tests on the added regressors' coefficients.   (See here and here.)
  5. R2 has an interpretation in terms of information content of the data.  
  6. R2 is the square of the (Pearson) correlation (RC2) between actual and "fitted" values of the model's dependent variable. 
However, as soon as we're dealing with a model that excludes an intercept or is non-linear in the parameters, or we use an estimator other than OLS, none of the above properties are guaranteed.

Monday, October 7, 2019

October Reading

Here's my latest, and final, list of suggested reading:
  • Bellego, C. and L-D. Pape, 2019. Dealing with the log of zero in regression models. CREST Working Paper No. 2019-13.
  • Castle, J. L., J. A. Doornik, and D. F. Hendry, 2018. Selecting a model for forecasting. Department of Economics, University of Oxford, Discussion Paper 861.
  • Gorajek, A., 2019. The well-meaning economist. Reserve Bank of Australia, Research Discussion Paper RDP 2019-08.
  • Güriş, B., 2019. A new nonlinear unit root test with Fourier function. Communications in Statistics - Simulation and Computation, 48, 3056-3062.
  • Maudlin, T., 2019. The why of the world. Review of The Book of Why: The New Science of Cause and Effect, by J. Pearl and D. Mackenzie. Boston Review.
  • Qian, W., C. A. Rolling, G. Cheng, and Y. Yang, 2019. On the forecast combination puzzle. Econometrics, 7, 39. 

© 2019, David E. Giles