Econometrics Beat: Dave Giles' Blog: 2019

Thursday, October 31, 2019

It's Time to Go

When I released my first post on the blog on 20th. Febuary 2011 I really wasn't sure what to expect! After all, I was aiming to reach a somewhat niche audience.

Well, 949 posts and 7.4 million page-hits later, this blog has greatly exceeded my wildest expectations.

However, I'm now retired and I turned 70 three months ago. I've decided to call it quits, and this is my final post.

I'd rather make a definite decision about this than have the blog just fizzle into nothingness.

For now, the Econometrics Beat blog will remain visible, but it will be closed for further comments and questions.

I've had a lot fun and learned a great deal through this blog. I owe a debt of gratitude to all of you who've followed my posts, made suggestions, asked questions, made helpful comments, and drawn errors to my attention.

I just hope that it's been as positive an experience for you as it has been for me.

Thank you - and enjoy your Econometrics!

Wednesday, October 30, 2019

Everything's Significant When You Have Lots of Data

Well........, not really!

It might seem that way on the face of it, but that's because you're probably using a totally inappropriate measure of what's (statistically) significant, and what's not.

I talked a bit about this issue in a previous post, where I said:

"Granger (1998, 2003) has reminded us that if the sample size is sufficiently large, then it's virtually impossible not to reject almost any hypothesis. So, if the sample is very large and the p-values associated with the estimated coefficients in a regression model are of the order of, say, 0.10 or even 0.05, then this really bad news. Much, much, smaller p-values are needed before we get all excited about 'statistically significant' results when the sample size is in the thousands, or even bigger."

This general point, namely that our chosen significance level should be decreased as the sample size grows, is pretty well understood by most statisticians and econometricians. (For example, see Good, 1982.) However, it's usually ignored by the authors of empirical economics studies based on samples of thousands (or more) observations. Moreover, a lot of practitioners seem to be unsure of just how much they should revise their significance levels (or re-interpret their p-values) in such circumstances.

There's really no excuse for this, because there are some well-established guidelines to help us. In fact, as we'll see, some of them have been around since at least the 1970's.

Let's take a quick look at this, because it's something that all students need to be made aware of as we work more and more with "big data". Students certainly won't gain this awareness by looking at the interpretation of the results in the vast majority of empirical economics papers that use even sort-of-large samples!

Reporting an R-Squared Measure for Count Data Models

This post was prompted by an email query that I received some time ago from a reader of this blog. I thought that a more "expansive" response might be of interest to other readers............

In spite of its many limitations, it's standard practice to include the value of the coefficient of determination (R²) - or its "adjusted" counterpart - when reporting the results of a least squares regression. Personally, I think that R² is one of the least important statistics to include in our results, but we all do it. (See this previous post.)

If the regression model in question is linear (in the parameters) and includes an intercept, and if the parameters are estimated by Ordinary Least Squares (OLS), then R² has a number of well-known properties. These include:

0 ≤ R² ≤ 1.
The value of R² cannot decrease if we add regressors to the model.
The value of R² is the same, whether we define this measure as the ratio of the "explained sum of squares" to the "total sum of squares" (R_E²); or as one minus the ratio of the "residual sum of squares" to the "total sum of squares" (R_R²).
There is a correspondence between R² and a significance test on all slope parameters; and there is a correspondence between changes in (the adjusted) R² as regressors are added, and significance tests on the added regressors' coefficients. (See here and here.)
R² has an interpretation in terms of information content of the data.
R² is the square of the (Pearson) correlation (R_C²) between actual and "fitted" values of the model's dependent variable.

However, as soon as we're dealing with a model that excludes an intercept or is non-linear in the parameters, or we use an estimator other than OLS, none of the above properties are guaranteed.

October Reading

Here's my latest, and final, list of suggested reading:

Bellego, C. and L-D. Pape, 2019. Dealing with the log of zero in regression models. CREST Working Paper No. 2019-13.
Castle, J. L., J. A. Doornik, and D. F. Hendry, 2018. Selecting a model for forecasting. Department of Economics, University of Oxford, Discussion Paper 861.
Gorajek, A., 2019. The well-meaning economist. Reserve Bank of Australia, Research Discussion Paper RDP 2019-08.
Güriş, B., 2019. A new nonlinear unit root test with Fourier function. Communications in Statistics - Simulation and Computation, 48, 3056-3062.
Maudlin, T., 2019. The why of the world. Review of The Book of Why: The New Science of Cause and Effect, by J. Pearl and D. Mackenzie. Boston Review.
Qian, W., C. A. Rolling, G. Cheng, and Y. Yang, 2019. On the forecast combination puzzle. Econometrics, 7, 39.

Sunday, September 1, 2019

Back to School Reading

Here we are - it's Labo(u)r Day weekend already in North America, and we all know what that means! It's back to school time.

You'll need a reading list, so here are some suggestions:

Frances, Ph. H. B. F., 2019. Professional forecasters and January. Econometric Institute Research Papers EI2019-25, Erasmus University Rotterdam.
Harvey, A. & R. Ito, 2019. Modeling time series when some observations are zero. Journal of Econometrics, in press.
Leamer, E. E., 1978. Specification Searches: Ad Hoc Inference With Nonexperimental Data. Wiley, New York. (This is a legitimate free download.)
MacKinnon, J. G., 2019. How cluster-robust inference is changing applied econometrics. Working Paper 1413, Economics Department, Queen's University.
Steel, M. F. J., 2019. Model averaging and its use in economics. Mimeo., Department of Statistics, University of Warwick.
Stigler, S. M., 1981. Gauss and the invention of least squares. Annals of Statistics, 9, 465-474.

Tuesday, August 20, 2019

Book Series on "Statistical Reasoning in Science & Society"

Back in early 2016, the American Statistical Association (ASA) made an announcement in its newsletter, Amstat News, about the introduction of an important new series of books. In part, that announcement said:

"The American Statistical Association recently partnered with Chapman & Hall/CRC Press to launch a book series called the ASA-CRC Series on Statistical Reasoning in Science and Society.

'The ASA is very enthusiastic about this new series,' said 2015 ASA President David Morganstein, under whose leadership the arrangement was made. 'Our strategic plan includes increasing the visibility of our profession. One way to do that is with books that are readable, exciting, and serve a broad audience having a minimal background in mathematics or statistics.'

The Chapman & Hall/CRC press release states the book series will do the following:

Highlight the important role of statistical and probabilistic reasoning in many areas

Require minimal background in mathematics and statistics

Serve a broad audience, including professionals across many fields, the general public, and students in high schools and colleges

Cover statistics in wide-ranging aspects of professional and everyday life, including the media, science, health, society, politics, law, education, sports, finance, climate, and national security

Feature short, inexpensive books of 100–150 pages that can be written and read in a reasonable amount of time."

Seven titles have now been published in this series -

Measuring Society, by Chaitra H. Nagaraja (2019)
Measuring Crime: Behind the Statistics, by Sharon L. Lohr (2019)
Statistics and Health Care Fraud: How to Save Billions, by Tahir Ekin (2019)
Improving Your NCAA® Bracket with Statistics, by Tom Adams (2018)
Data Visualization: Charts, Maps, and Interactive Graphics, by Robert Grant (2018)
Visualizing Baseball, by Jim Albert (2017)
Errors, Blunders, and Lies: How to Tell the Difference, by David S. Salsburg (2017)

Readers of this blog should be especially interested in Chaitra Nagaraja's recently published addition to this series. Chaitra devotes chapters in her book to the topics of Jobs, Inequality, Housing, Prices, Poverty, and Deprivation. I particularly like the historical perspective that Chaitra provides in this very readable contribution, and I recommend her book to you (and your non-economist friends).

Wednesday, August 14, 2019

Check out What Happened at the 2019 Joint Statistical Meetings

Each year, the Joint Statistical Meetings (JSM) bring together thousands (6,500 this year) of statisticians at what's the largest gathering of its type in the world. The JSM represent eleven international statistics organisations, including the four founding organisations - The American Statistical Association (ASA), The International Biometric Society, The Institute of Mathematical Statistical, and The Statistical Society of Canada.

As a member of the ASA since 1973 I've attended a few of these meetings over the years, but unfortunately I didn't make it to the JSM in Denver at the end of last month. As always, the program was amazing.

Yesterday, the ASA released a searchable version of the 2019 program that contains downloadable files of the slides used by many of the speakers. You can find that version of the program here. When you go through the program, look for presentations that have blue (rectangular) "Presentation" button. Papers in sessions sponsored by the Business and Economic Statistics section of the ASA may be of special interest to you - but there's lots to choose from!

Tuesday, August 6, 2019

Including More History in Your Econometrics Teaching

If you follow this blog (or if you look at the "History of Econometrics" label in the word cloud in the right side-bar), you'll know that I have more than a passing interest in the history of our discipline. There's so much to be learned from this history. Among other things, we can gain insights into why certain methods became popular, and we can reduce the risk of repeating earlier mistakes!

When I was teaching I liked to inject a few historical facts/anecdotes/curiosities into my classes. I think that this brought the subject matter to life a little. The names behind the various theorems, tests, and estimators are those of real people, after all.

There are some excellent books on the history of econometrics, including those by Epstein (1987), Morgan (1990), and De Marchi and Gilbert (1991). (Also, see the short piece by Stephen Pollock, 2014.)

However, I think that we could do more in terms of making material about this history accessible to our students.

The Statistics community has gone much further in this direction, and we might take note of this.

The other day, Amanda Golbeck posted some very helpful links on the American Statistical Association's "History of Statistics Interest Group" community noticeboard.

Here's her posting in its entirety - and don't miss the first of her links:

"Why not include more history in your teaching? The History of Statistics Interest Group library has a collection of Activities for Classes: community.amstat.org/historyofstats/ourlibrary/...

We are pleased to let you know that Bob Rosenfeld has created 13 history of probability and statistics teaching modules, and he has kindly made them available for you to use in your classes! We hope you will find them to be useful.

Reading and Exercises on the History of Probability from the Vermont Mathematics Initiative, Bob Rosenfeld

Pre-history to 1600 (PDF)
17th Century France (PDF)
Jacob Bernoulli - Law of Large Numbers (PDF)
Inverse Probability - Thomas Bayes (PDF)
Laplace (PDF)

Reading and Exercises on the History of Statistics from the Vermont Mathematics Initiative, Bob Rosenfeld

John Graunt and the Bills of Mortality (PDF)
Origin of the Normal Curve (PDF)
Origins of Graphs in Statistics (PDF)
Fitting models to data - the Path to Least Squares (PDF)
Statistics Moves from Physical to Social Sciences (PDF)
Correlation - Francis Galton (PDF)
t-Distribution and Gosset (PDF)
Fisher and Design of Experiments (PDF)"

(Bob Rosenfeld was former Co-Director for Statistics and School-Based Research at the Vermont Mathenatics initiative, and the author of a number of books on the teaching of statistics to K-8 students. D.G.)

Most of Bob Rosenfeld's pieces are directly relevant to econometrics students. It would be nice to see more material about the history of our discipline that could be incorporated into introductory econometrics courses.

References

De Marchi, N. & C. Gilbert, 1990. History and Methodology of Econometrics. Oxford University Press, Oxford.

Epstein, R. J. 1987. A History of Econometrics. North-Holland, Amsterdam.

Morgan, M. S., 1991. The History of Econometric Ideas. Cambridge University Press, Cambridge.

Pollock, D. S. G., 2014. Econometrics - An historical guide for the uninitiated. Working Paper No. 14/05, Department of economics, University of Leicester.

Friday, August 2, 2019

Sunday, July 28, 2019

AAEA Meeting, 2019

The Agricultural and Applied Economics Association (AAEA) recently held its annual meeting in Atlanta, GA. You can find the extensive program here.

This year, I was fortunate enough to be able to attend and participate.

This was thanks to the kind invitation of Marc Bellemare, a member of the Executive Board of the AAEA, and (of course) a blogger whom many of you no doubt follow. (If you don't, then you should!)

Marc arranged a session in which he and I talked about the pros and cons of The Cookbook Approach to Teaching Econometrics. The session was well attended, and the bulk of the time was devoted to a very helpful discussion-question-answer period with the audience.

As you'll know from some of my previous posts (e.g., here and here), I'm not a big fan of The Cookbook Approach - at least, not if it's the primary/sole way of teaching econometrics. Marc made the point that there's a place for this approach if it's adopted after more formal courses in econometrics. I'm in agreement with that.

I put together a few background talking-point slides for my short presentation. For what they're worth, you'll find then here.

I really enjoyed my time at the AAEA meeting, and I learned a lot. Thanks, Marc, and thank you to the participants!

Saturday, July 6, 2019

Seasonal Unit Roots - Background Information

A recent email query about the language that we use in the context of non-stationary seasonal data, and how we should respond to the presence of "seasonal unit roots", suggested to me that a short background post about some of this might be in order.

To get the most from what follows, I suggest that you take a quick look at this earlier post of mine - especially to make sure that you understand the distinction between "deterministic" seasonality" and "stochastic seasonality" in time-series data.

There's an extensive econometrics literature on stochastic seasonality and testing for seasonal unit roots, and this dates back at least to 1990. This is hardly a new topic, but it's one that's often overlooked in the empirical applications.

Although several tests for seasonal unit roots are available, the most commonly used one is that proposed by Hylleberg et al. (1990) - hereafter "HEGY". Depending on what statistical/econometrics package you prefer to use, you'll have at least some access to the HEGY test(s), and perhaps some others. For instance there are routines that you can use with R, stata, and Gretl.

The EViews package includes a rather complete built-in suite of different seasonal unit root tests for time series data with various periodicities - 2, 4, 5, 6, 7, and 12. This enables us to deal with trading-day weekly data, and calendar weekly data, as well as the usual "seasonal" frequencies.

I'm not going to be going over the tests themselves here.

Rather, the objectives of this post are, first, to provide a bit of background information about the language that's used when we're talking about seasonal unit roots. For instance, why do we refer to roots at the zero, π, frequencies, etc.? Second, in what way(s) do we need to filter a time series in order to remove the unit roots at the various frequencies?

Let's begin by considering a quarterly time series, X_t (t = 1, 2, ........). We'll use the symbol "L" to denote the lag operator. So. L(X_t) = X_t-1; L²(X_t) = L(L(X_t)) = L(X_t-1) = X_t-2; etc. In general, L^k(X_t) = X_t-k.

July Reading

This month my reading list is a bit different from the usual one. I've taken a look back at past issues of Econometrica and Journal of Econometrics, and selected some important and interesting papers that happened to be published in July issues of those journals.

Here's what I came up with for you:

Aigner, D., C. A. K. Lovell, & P. Schmidt, 1977. Formulation and estimation of stochastic frontier production function models. Journal of Econometrics, 6, 21-37.
Chow, G. C., 1960. Tests of equality between sets of coefficients in two linear regressions. Econometrica, 28, 591-605.
Davidson, R. & J. G. MacKinnon, 1984. Convenient specification tests for logit and probit models. Journal of Econometrics, 25, 241-262.
Dickey, D. A. & W. A. Fuller, 1981. Likelihood ratio statistics for autoregressive time series with a unit root. Econometrica, 49, 1057-1072.
Granger, C. W. J. & P. Newbold, 1974. Spurious regressions in econometrics. Journal of Econometrics, 2, 111-120.
Sargan, J. D., 1961. The maximum likelihood estimation of economic relationships with autoregressive residuals. Econometrica, 29, 414-426.

Friday, June 21, 2019

Consulting Can be Fun!

Over the years, I've done a modest amount of paid econometrics consulting work - in the U.S., New Zealand, Australia, the U.K., and here in Canada. Each job has been interesting, and rewarding, and I've always learned a great deal form the briefs that I've undertaken.

The other day, a friend asked me, "Which consulting job was the most fun?"

Actually, the answer was easy!

A few years ago I consulted for the Office of the Auditor General of Canada, in Ottawa. I was brought in because I had consulted for Revenue New Zealand on the issue of tax evasion, and I had co-authored a book on the Canadian "underground economy" with Lindsay Tedds.

So what was the consulting work with the Auditor General's office all about? Well, they were conducting an audit of what was then called Revenue Canada (now, the Canadian Revenue Agency). In other words, "the tax man"!

Although the report arising from this audit is a matter of public record, I won't go into it here.

Suffice to say, what could be more fun that conducting an audit of your country's tax authority?

Thursday, June 20, 2019

2019 Edition of the INOMICS Handbook

I'm sure that all readers will be familiar with INOMICS, and the multitude of resources that they make available to economists.

The INOMICS Handbook, 2019 is now available, and I commend it to you.

This year's edition of the Handbook includes material relating to:

The gender bias in the field of economics
The soft skills you need to succeed as an economist
Climate change and how economics can help solve it
What makes a successful economist
An exclusive interview with Princeton Professor, Esteban Rossi-Hansberg
Winners of the INOMICS Awards 2019
Recommended study and career opportunities

Tuesday, June 11, 2019

More Tributes to Clive Granger

As a follow-up to my recent post, "Clive Granger Special Issue", I received an email from Eyüp Çetin (Editor of the European Journal of Pure and Applied Mathematics).

Eyüp kindly pointed out that "......... actually, we published the first special issue dedicated to his memory exactly on 27 May 2010, the first anniversary of his passing at https://www.ejpam.com/index.php/ejpam/issue/view/11

We think this was the first special issue dedicated to his memory in the world. The Table of Contents may be found here https://www.ejpam.com/index.php/ejpam/issue/view/11/showToc .

Another remarkable point that we also published some personal and institutional tributes and some memorial stories for Sir Granger that never appeared elsewhere before at

https://www.ejpam.com/index.php/ejpam/article/view/805 .

Some institutions such as Royal Statistical Society, Japan Statistical Society and University of Canterbury have sent their tributes to this special volume."

Friday, June 7, 2019

Clive Granger Special Issue

The recently published Volume 10, No. 1 issue of the European Journal of Pure and Applied Mathematics takes the form of a memorial issue for Clive Granger. You can find the Table of Contents here, and all of the articles can be downloaded freely.

This memorial issue is co-edited by Jennifer Castle and David Hendry. The contributed papers include ones that deal with Forecasting, Cointegration, Nonlinear Time Series, and Model Selection.

This is a fantastic collection of important survey-type papers that simply must read!

Friday, May 31, 2019

Reading Suggestions for June

Well, here we are - it's June already.

Here are my reading suggestions:

Abadie, A., S. Athey, G. Imbens, & J. Wooldridge, 2017. When should you adjust standard errors for clustering? Mimeo.
Berk, R., A. Buja, L. Brown, E. George, A. K. Kuchibhotla, W. Su, & L, Shazo, 2019. Assumption lean regression. American Statistician, in press.
Ghosh, T., M. Ghosh, & T. Kubokawa, 2019. On the loss robustness of least-square estimators, American Statistician, in press.
Gustafsson, O. & P. Stockhammar, 2019. Variance stabilizing filters. Communications in Statistics - Theory and Methods, in press.
Kocherlakota, N. R., 2019. A near-exact finite sample theory for an instrumental variable estimator. Mimeo. (Hat-tip to Frank Diebold.)
Panagiotelis, A., G. Anathasopoulos, R. J. Hyndman, B. Jiang, & F. Vahid, 2019. Macroeconomic forecasting for Australia using a large number of predictors. International Journal of Forecasting, 35, 613-633.

Sunday, May 19, 2019

Update on the "Series of Unsurprising Results in Economics"

In June of last year I had a post about a new journal, Series of Unsurprising Results in Economics (SURE).

If you didn't get to read that post, I urge you to do so.

More importantly, you should definitely take a look at this piece by Kelsey Piper, from a couple of days ago, and titled, "This economics journal only publishes results that are no big deal - Here’s how that might save science".

Kelsey really understands the rationale for SURE, and the important role that it can play in terms of reducing publication bias, and assisting with replicating results.

You can get a feel for what SURE has to offer by checking out this paper by Nick Huntington-Klein and Andrew Gill that they are publishing.

We'll all be looking forward to more excellent papers like this!

Wednesday, May 1, 2019

May Reading List

Here's a selection of suggested reading for this month:

Athey, S. & G. W. Imbens, 2019. Machine learning methods economists should know about. Mimeo.
Bhagwat, P. & E. Marchand, 2019. On a proper Bayes but inadmissible estimator. American Statistician, online.
Canals, C. & A. Canals, 2019. When is n large enough? Looking for the right sample size to estimate proportions. Journal of Statistical Computation and Simulation, 89, 1887-1898.
Cavaliere, G. & A. Rahbek, 2019. A primer on bootstrap testing of hypotheses in time series models: With an application to double autoregressive models. Discussion Paper 19-03, Department of Economics, University of Copenhagen.
Chudik, A. & G. Geogiardis, 2019. Estimation of impulse response functions when shocks are observed at a higher frequency than outcome variables. Globalization Institute Working Paper 356, Federal Reserve Bank of Dallas.
Reschenhofer, E., 2019. Heteroscedasticity-robust estimation of autocorrelation. Communications in Statistics - Simulation and Computation, 48, 1251-1263.

Monday, April 29, 2019

Recursions for the Moments of Some Continuous Distributions

This post follows on from my recent one, Recursions for the Moments of Some Discrete Distributions. I'm going to assume that you've read the previous post, so this one will be shorter.

What I'll be discussing here are some useful recursion formulae for computing the moments of a number of continuous distributions that are widely used in econometrics. The coverage won't be exhaustive, by any means. I provide some motivation for looking at formulae such as these in the previous post, so I won't repeat it here.

When we deal with the Normal distribution, below, we'll make explicit use of Stein's Lemma. Several of the other results are derived (behind the scenes) by using a very similar approach. So, let's begin by stating this Lemma.

Stein's Lemma (Stein, 1973):

"If X ~ N[θ , σ²], and if g(.) is a differentiable function such that E|g'(X)| is finite, then

E[g(X)(X - θ)] = σ² E[g'(X)]."

It's worth noting that although this lemma relates to a single Normal random variable, in the bivariate Normal case the lemma generalizes to:

"If X and Y follow a bivariate Normal distribution, and if g(.) is a differentiable function such that E|g'(Y)| is finite, then

Cov.[g(Y ), X] = Cov.(X , Y) E[g'(Y)]."

In this latter form, the lemma is useful in asset pricing models.

There are extensions of Stein's Lemma to a broader class univariate and multivariate distributions. For example, see Alghalith (undated), and Landsman et al. (2013), and the references in those papers. Generally, if a distribution belongs to an exponential family, then recursions for its moments can be obtained quite easily.

Now, let's get down to business............

Recursions for the Moments of Some Discrete Distributions

You could say, "Moments maketh the distribution". While that's not quite true, it's pretty darn close.

The moments of a probability distribution provide key information about the underlying random variable's behaviour, and we use these moments for a multitude of purposes. Before proceeding, let's be sure that we're on the same page here.

2019 Econometric Game Results

The Econometric Game is over for another year.

The winning team for 2019 was from the University of Melbourne.

The second and third placed teams were from the Maastricht University and Aarhus University, respectively.

Congratulations to the winning teams, and to all who competed this year!

© 2019, David E. Giles

Wednesday, April 10, 2019

EViews 11 Now Available

As you'll know already, I'm a big fan of the EViews econometrics package. I always found it to be a terrific, user-friendly, resource when teaching economic statistics and econometrics, and I use it extensively in my own research.

Along with a lot of other EViews users, I recently had the opportunity to "test drive" the beta release of the latest version of this package, EViews 11.

EViews 11 has now been officially released, and it has some great new features. (Click on the links there to see some really helpful videos.) To see what's now available, check it out here.

Nice update. Thanks!

Tuesday, April 9, 2019

SHAZAM!

This past weekend the new movie, Shazam, topped the box-office revenue list with over US$53million - and this is it's first weekend since being released.

Not bad!

Of course, in the Econometrics World, we associate the word, SHAZAM, with Ken White's famous computing package, which has been with us since 1977.

Ken and I go way back. A few years ago I had a post about the background to the SHAZAM package. In that post I explained what the acronym "SHAZAM" stands for. If you check it out you'll see why it's timely for you to know these important historical facts!

And while you're there, take a look at the links to other tales that illustrate Ken's well-known wry sense of humour.

Monday, April 8, 2019

A Permutation Test Regression Example

In a post last week I talked a bit about Permutation (Randomization) tests, and how they differ from the (classical parametric) testing procedure that we generally use in econometrics. I'm going to assume that you've read that post.

(There may be a snap quiz at some point!)

I promised that I'd provide a regression-based example. After all, the two examples that I went through in that previous post were designed to expose the fundamentals of permutation/randomization testing. They really didn't have much "econometric content".

In what follows I'll use the terms "permutation test" and "randomization test" interchangeably.

What we'll do here is to take a look at a simple regression model and see how we could use a randomization test to see if there is a linear relationship between a regressor variable, x, and the dependent variable, y. Notice that I said a "simple regression" model. That means that there's just the one regressor (apart from an intercept). Multiple regression models raise all sorts of issues for permutation tests, and we'll get to that in due course.

There are several things that we're going to see here:

How to construct a randomization test of the hypothesis that the regression slope coefficient is zero.
A demonstration that the permutation test is "exact". That it, its significance level is exactly what we assign it to be.
A comparison between a permutation test and the usual t-test for this problem.
A demonstration that the permutation test remains "exact", even when the regression model is mi-specified by fitting it through the origin.
A comparison of the powers of the randomization test and the t-test under this model mis-specification.

What is a Permutation Test?

Permutation tests, which I'll be discussing in this post, aren't that widely used by econometricians. However, they shouldn't be overlooked.

Let's begin with some background discussion to set the scene. This might seem a bit redundant, but it will help us to see how permutation tests differ from the sort of tests that we usually use in econometrics.

Background Motivation

When you took your first course in economic statistics, or econometrics, no doubt you encountered some of the basic concepts associated with testing hypotheses. I'm sure that the first exposure that you had to this was actually in terms of "classical", Neyman-Pearson, testing.

It probably wasn't described to you in so many words. It would have just been "statistical hypothesis testing". The whole procedure would have been presented, more or less, along the following lines:

Some April Reading for Econometricians

Here are my suggestions for this month:

Hyndman, R. J., 2019. A brief history of forecasting competitions. Working Paper 03/19, Department of Econometrics and Business Statistics, Monash University.
Kuffner, T. A. & S. G. Walker, 2019. Why are p-values controversial?. American Statistician, 73, 1-3.
Sargan, J. D.,, 1958. The estimation of economic relationships using instrumental variables. Econometrica, 26, 393-415. (Read for free online.)
Sokal, A. D., 1996. Transgressing the boundaries: Towards a trasnformative hermeneutics of quantum gravity. Social Text, 46/47, 217-252.
Zeng, G. & Zeng, E., 2019. On the relationship between multicollinearity and separation in logistic regression. Communications in Statistics - Simulation and Computation, published online.
Zhang, X., S. Paul, & Y-G. Yang, 2019. Small sample bias correction or bias reduction? Communications in Statistics - Simulation and Computation, published online.

Friday, March 29, 2019

Infographics Parades

When I saw Myko Clelland's tweet this morning, my reaction was "Wow! Just, wow!"

Myko (@DapperHistorian) kindly pointed me to the source of this photo that he tweeted about:

It appears on page 343 of Willard Cope Brinton's book, Graphic Methods for Presenting Facts (McGraw-Hill, 1914).

Myko included a brief description in his tweet, but let me elaborate by quoting from pp.342-343 of Brinton's book, and you'll see why I liked the photo so much:

"Educational material shown in parades gives an effective way for reaching vast numbers of people. Fig. 238 illustrates some of the floats used in presenting statistical information in the municipal parade by the employees of the City of New York, May 17, 1913. The progress made in recent years by practically every city department was shown by comparative models, charts, or large printed statements which could be read with ease fro either side of the street. Even though the day of the parade was rainy, great crowds lined the sidewalks. There can be no doubt that many of the thousands who saw the parade came away with the feeling that much is being accomplished to improve the conditions of municipal management. A great amount of work was necessary to prepare the exhibits, but the results gave great reward."

Don't you just love it? A gigantic mobile poster session!

Thursday, March 21, 2019

A World Beyond p < 0.05

The American Statistical Association has just published a special supplementary issue of The American Statistician, titled Statistical Inference in the 21st. Century: A World Beyond p < 0.05.

This entire issue is open-access. In addition to an excellent editorial, Moving to a World Beyond "p < 0.05" (by Ronald Wasserstein, Allen Schirm, and Nicole Lazar) it comprises 43 articles with such titles as:

The p-Value Requires Context, Not a Threshold (by Rebecca Betensky)
The False Positive Risk: A Proposal Concerning What to do About p-Values (by David Colquhoun)
What Have we (Not) Learnt From Millions of Scientific Papers With P Values? (by John Ioannidis)
Three Recommendations for Improving the Use of p-Values (by Daniel Benjamin and James Berger)

I'm sure that you get the idea of what this supplementary issue is largely about.

But look back at its title - Statistical Inference in the 21st. Century: A World Beyond p < 0.05. It's not simply full of criticisms. There's a heap of excellent, positive, and constructive material in there.

Highly recommended reading!

Wednesday, March 20, 2019

The 2019 Econometric Game

The annual World Championship of Econometrics, The Econometric Game, is nearly upon us again!

Readers of this blog will be familiar with "The Game" from posts relating to this event in previous years. For example, see here for some 2018 coverage.

This year The Econometric Game will be held from 10 to 12 April. As usual, it is being organized by the study association for Actuarial Science, Econometrics & Operational Research (VSAE) of the University of Amsterdam.

Teams of graduate students from around the globe will be competing for top prize on the basis of their analysis of econometrics case studies. The top three tams in 2018 were from Universidad Carlos III Madrid, Harvard University, and Aarhus University.

Check out this year's Game, and I'll post more on it next month.

(30 March, 2019 update - This year's theme has now been announced. It's "Climate Econometrics".)

Wednesday, March 13, 2019

Forecasting After an Inverse Hyperbolic Sine Transformation

There are all sorts of good reasons why we sometimes transform the dependent variable (y) in a regression model before we start estimating. One example would be where we want to be able to reasonably assume that the model's error term is normally distributed. (This may be helpful for subsequent finite-sample inference.)

If the model has non-random regressors, and the error term is additive, then a normal error term implies that the dependent variable is also normally distributed. But it may be quite plain to us (even from simple visual observation) that the sample of data for the y variable really can't have been drawn from a normally distributed population. In that case, a functional transformation of y may be in order.

So, suppose that we estimate a model of the form

f(y_i) = β₁ + β₂ x_i2 + β₃ x_i3 + .... + β_k x_ik + ε_i ; ε_i ~ iid N[0 , σ²] . (1)

where f(.) is usually a 1-1 function, so that f^-1(.) is uniquely defined. Examples include f(y) = log(y), (where, throughout this post, log(a) will mean the natural logarithm of 'a'.); and f(y) = √(y) (if we restrict ourselves to the positive square root).

Having estimated the model, we may then want to generate forecasts of y itself, not of f(y). This is where the inverse transformation, f^-1(y), comes into play.

Update for A New Canadian Macroeconomic Database

In a post last November I discussed "A New Canadian Macroeconomic Database".

The long-term, monthly, database in question was made available by Olivier Fortin-Gagnon, Maxime Leroux, Dalibor Stevanovic, &and Stéphane Suprenant. Their 2018 working paper, "A Large Canadian Database for Macroeconomic Analysis", provides details and some applications of the new data.

Dailbor wrote to me yesterday to say that the database has now been updated. This is great news! Regular updates are crucial for important data repositories such as this one.

The updated database can be accessed at www.stevanovic.uqam.ca/DS_LCMD.html .

Wednesday, March 6, 2019

Forecasting From a Regression with a Square Root Dependent Variable

Back in 2013 I wrote a post that was titled, "Forecasting From Log-Linear Regressions". The basis for that post was the well-known result that if you estimate a linear regression model with the (natural) logarithm of y as the dependent variable, but you're actually interested in forecasting y itself, you don't just report the exponentials of the original forecasts. You need to add an adjustment that takes account of the connection between a Normal random variable and a log-Normal random variable, and the relationship between their means.

Today, I received a query from a blog-reader who asked how the results in that post would change if the dependent variable was the square root of y, but we wanted to forecast the y itself. I'm not sure why this particular transformation was of interest, but let's take a look at the question.

In this case we can exploit the relationship between a (standard) Normal distribution and a Chi-Square distribution in order to answer the question.

Some Recommended Econometrics Reading for March

This month I am suggesting some overview/survey papers relating to a variety of important topics in econometrics:

Bruns, S. B. & D. I. Stern, 2019. Lag length selection and p-hacking in Granger causality testing: prevalence and performance of meta-regression models. Empirical Economics, 56, 797-830.
Casini, A. & P. Perron, 2018. Structural breaks in time series. Forthcoming in Oxford Research Encyclopedia in Economics and Finance.
Hendry, D. F. & K. Juselius, 1999. Explaining cointegration analysis: Pat I. Mimeo., Nuffield College, University of Oxford.
Hendry, D. F. & K. Juselius, 2000. Explaining cointegration analysis: Part II. Mimeo., Nuffield College, University of Oxford.
Horowitz, J., 2018. Bootstrap methods in econometrics. Cemmap Working Paper CWP53/18.
Marmer, V., 2017. Econometrics with weak instruments: Consequences, detection, and solutions. Mimeo., Vancouver School of Economics, University of British Columbia.

Sunday, February 10, 2019

A Terrific New Book on the Linear Model

Recently, it was my distinct pleasure to review a first-class book by David Harville, titled Linear Models and the Relevant Distributions and Matrix Algebra.

(Added 28 February, 2019: You can now read the published review in Statistical Papers, here.)

Here is what I had to say:

Misinterpreting Tests, P-Values, Confidence Intervals & Power

There are so many things in statistics (and hence in econometrics) that are easily, and frequently, misinterpreted. Two really obvious examples are p-values and confidence intervals.

I've devoted some space in earlier posts to each of these concepts, and their mis-use. For instance, in the case of p-values, see the posts here and here; and for confidence intervals, see here and here.

Today I was reading a great paper by Greenland et al. (2016) that deals with some common misconceptions and misinterpretations that arise not only with p-values and confidence intervals, but also with statistical tests in general and the "power" of such tests. These comments by the authors in the abstract for their paper sets the tone of what's to follow rather nicely:

"A key problem is that there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof. Instead, correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists. This high cognitive demand has led to an epidemic of shortcut deﬁnitions and interpretations that are simply wrong, sometimes disastrously so - and yet these misinterpretations dominate much of the scientiﬁc literature."

The paper then goes through various common interpretations of the four concepts in question, and systematically demolishes them!

The paper is extremely readable and informative. Every econometrics student, and most applied econometricians, would benefit from taking a look!

Reference

Greenland, S., S. J. Senn, K. R. Rothman, J. B. Carlin, C. Poole, S. N. Goodman, & D. G. Altman, 2016. Statistical tests, p values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31, 337-350.

Sunday, February 3, 2019

February Reading

Now that Groundhog Day is behind us, perhaps we can focus on catching up on our reading?

Deboulets, L. D. D., 2018. A review on variable selection in regression. Econometrics, 6(4), 45.
Efron, B. & C. Morris, 1977. Stein's paradox in statistics. Scientific American, 236(5), 119-127.
Khan, W. M. & A. u I. Khan, 2018. Most stringent test of independence for time series. Communications in Statistics - Simulation and Computation, online.
Pedroni, P., 2018. Panel cointegration techniques and open challenges. Forthcoming in Panel Data Econometrics, Vol. 1: Theory, Elsevier.
Steel, M. F., J., 2018. Model averaging and its use in economics. MPRA Paper No. 90110.
Tay, A. S. & K. F. Wallis, 2000. Density forecasting: A survey. Journal of Forecasting, 19, 235-254.

Sunday, January 13, 2019

Machine Learning & Econometrics

What is Machine Learning (ML), and how does it differ from Statistics (and hence, implicitly, from Econometrics)?

Those are big questions, but I think that they're ones that econometricians should be thinking about. And if I were starting out in Econometrics today, I'd take a long, hard look at what's going on in ML.

Here's a very rough answer - it comes from a post by Larry Wasserman on his (now defunct) blog, Normal Deviate:

"The short answer is: None. They are both concerned with the same question: how do we learn from data?

But a more nuanced view reveals that there are differences due to historical and sociological reasons..........

If I had to summarize the main difference between the two fields I would say:

Statistics emphasizes formal statistical inference (confidence intervals, hypothesis tests, optimal estimators) in low dimensional problems.

Machine Learning emphasizes high dimensional prediction problems.

But this is a gross over-simplification. Perhaps it is better to list some topics that receive more attention from one field rather than the other. For example:

Statistics: survival analysis, spatial analysis, multiple testing, minimax theory, deconvolution, semiparametric inference, bootstrapping, time series.

Machine Learning: online learning, semisupervised learning, manifold learning, active learning, boosting.

But the differences become blurrier all the time........

There are also differences in terminology. Here are some examples:

Statistics Machine Learning
———————————–
Estimation Learning
Classifier Hypothesis
Data point Example/Instance
Regression Supervised Learning
Classification Supervised Learning
Covariate Feature
Response Label

Overall, the the two fields are blending together more and more and I think this is a good thing."

As I said, this is only a rough answer - and it's by no means a comprehensive one.

For an econometrician's perspective on all of this you can't do better that to take a look at Frank Dielbold's blog, No Hesitations. If you follow up on his posts with the label "Machine Learning" - and I suggest that you do - then you'll find 36 of them (at the time of writing).

If (legitimately) free books are your thing, then you'll find some great suggestions for reading more about the Machine Learning / Data Science field(s) on the KDnuggets website - specifically, here in 2017 and here in 2018.

Finally, I was pleased that the recent ASSA Meetings (ASSA2019) included an important contribution by Susan Athey (Stanford), titled "The Impact of Machine Learning on Econometrics and Economics". The title page for Susan's presentation contains three important links to other papers and a webcast.

Have fun!

Friday, January 11, 2019

Shout-out for Mischa Fisher

One of my former grad. students, Mischa Fisher, is currently Chief Economist and Advisor to the Governor of the State of Illinois. In this role he has oversight of a number of State agencies dealing with economics and data science.

This week, he had a really nice post on the Datascience.com blog. It's titled "10 Data Science Pitfalls to Avoid".

Mischa is very knowledgeable, and he writes extremely well. I strongly recommend that you take a look at his piece.

Monday, January 7, 2019

Bradley Efron and the Bootstrap

Econometricians make extensive use of various forms of "The Bootstrap", thanks to Bradley (Brad) Efron's pioneering work.

I've posted about the history of the bootstrap previously - e.g., here, and here.

You probably know by now that Brad was awarded The International Prize in Statistics last November - this was only the second time that this prize has been awarded. It's difficult to think of a more deserving recipient.

If you want to read an excellent account of Brad's work, and how the bootstrap came to be, I recommend the 2003 piece by Susan Holmes, Carl Morris, and Rob Tibshirani.

There are some fascinating snippets in this conversation/interview, including:

Efron: "One of the reasons I came to Stanford was because of its humor magazine. I wrote a humor column at Caltech, and I always wanted to write for a humor magazine. Stanford had a great humor magazine, The Chaparral. The first few months I was there, the editor literally went crazy and had to be hospitalized, and so I became editor. For one issue we did a parody of Playboy and it went a little too far. I was expelled from school, ..... I went away for 6 months and then I came back. That was by far the most famous I’ve ever been."

Referring to his seminal paper (Efron, 1979):

Tibshirani: "It was sent to the Annals. What kind of reception did it get?"

Efron: "Rupert Miller was the editor of the Annals at the time. I submitted what was the Rietz lecture, and it got turned down. The associate editor, who will remain nameless, said it that didn’t have any theorems in it. So, I put some theorems in at the end and put a lot of pressure on Rupert, and he finally published it."

I guess there's still hope for the rest of us!

References

Efron, B., 1979. Bootstrap methods: Another look at the jackknife. Annals of Statistics, 7, 1-26.

Holmes, S., C. Morris, & R. Tibshirani, 2003. Bradley Efron: A conversation with good friends. Statistical Science, 18, 268-281.

© 2019, David E. Giles

Tuesday, January 1, 2019

New Year Reading Suggestions for 2019

With a new year upon us, it's time to keep up with new developments -

Basu, D., 2018. Can we determine the direction of omitted variable bias of OLS estimators? Working Paper 2018-16, Department of Economics, University of Massachusetts, Amherst.
Jiang, B., Y. Lu, & J. Y. Park, 2018. Testing for stationarity at high frequency. Working Paper 2018-9, Department of Economics, University of Sydney.
Psaradakis, Z. & M. Vavra, 2018. Normality tests for dependent data: Large-sample and bootstrap approaches. Communications in Statistics - Simulation and Computation, online.
Spanos, A., 2018. Near-collinearity in linear regression revisited: The numerical vs. the statistical perspective. Communications in Statistics - Theory and Methods, online.
Thorsrud, L. A., 2018. Words are the new numbers: A newsy coincident index of the business cycle. Journal of Business Economics and Statistics, online. (Working Paper version.)
Zhang, J., 2018. The mean relative entropy: An invariant measure of estimation error. American Statistician, online.

Pages