Wednesday, March 20, 2019

The 2019 Econometric Game

The annual World Championship of Econometrics, The Econometric Game, is nearly upon us again!

Readers of this blog will be familiar with "The Game" from posts relating to this event in previous years. For example, see here for some 2018 coverage.

This year The Econometric Game will be held from 10 to 12 April. As usual, it is being organized by the study association for Actuarial Science, Econometrics & Operational Research (VSAE) of the University of Amsterdam. 

Teams of graduate students from around the globe will be competing for top prize on the basis of their analysis of econometrics case studies. The top three tams in 2018 were from  Universidad Carlos III Madrid,  Harvard University, and Aarhus University.

Check out this year's Game, and I'll post more on it next month.

© 2019, David E. Giles

Wednesday, March 13, 2019

Forecasting After an Inverse Hyperbolic Sine Transformation

There are all sorts of good reasons why we sometimes transform the dependent variable (y) in a regression model before we start estimating. One example would be where we want to be able to reasonably assume that the model's error term is normally distributed. (This may be helpful for subsequent finite-sample inference.)

If the model has non-random regressors, and the error term is additive, then a normal error term implies that the dependent variable is also normally distributed. But it may be quite plain to us (even from simple visual observation) that the sample of data for the y variable really can't have been drawn from a normally distributed population. In that case, a functional transformation of y may be in order.

So, suppose that we estimate a model of the form

              f(yi) = β1 + β2 xi2 + β3 xi3 + .... + βk xik + εi ;    εi ~ iid N[0 , σ2] .                         (1)


where f(.) is usually a 1-1 function, so that f-1(.) is uniquely defined. Examples include f(y) = log(y), (where, throughout this post, log(a) will mean the natural logarithm of 'a'.); and f(y) = √(y) (if we restrict ourselves to the positive square root).

Having estimated the model, we may then want to generate forecasts of y itself, not of f(y). This is where the inverse transformation, f-1(y), comes into play.

Saturday, March 9, 2019

Update for A New Canadian Macroeconomic Database

In a post last November I discussed "A New Canadian Macroeconomic Database".

The long-term, monthly, database in question was made available by Olivier Fortin-Gagnon, Maxime Leroux, Dalibor Stevanovic, &and Stéphane Suprenant. Their 2018 working paper, "A Large Canadian Database for Macroeconomic Analysis", provides details and some applications of the new data.

Dailbor wrote to me yesterday to say that the database has now been updated. This is great news! Regular updates are crucial for important data repositories such as this one.

The updated database can be accessed at www.stevanovic.uqam.ca/DS_LCMD.html .

© 2019, David E. Giles

Wednesday, March 6, 2019

Forecasting From a Regression with a Square Root Dependent Variable

Back in 2013 I wrote a post that was titled, "Forecasting From Log-Linear Regressions". The basis for that post was the well-known result that if you estimate a linear regression model with the (natural) logarithm of y as the dependent variable, but you're actually interested in forecasting y itself, you don't just report the exponentials of the original forecasts. You need to add an adjustment that takes account of the connection between a Normal random variable and a log-Normal random variable, and the relationship between their means.

Today, I received a query from a blog-reader who asked how the results in that post would change if the dependent variable was the square root of y, but we wanted to forecast the y itself. I'm not sure why this particular transformation was of interest, but let's take a look at the question.

In this case we can exploit the relationship between a (standard) Normal distribution and a Chi-Square distribution in order to answer the question.

Friday, March 1, 2019

Some Recommended Econometrics Reading for March

This month I am suggesting some overview/survey papers relating to a variety of important topics in econometrics:
  • Bruns, S. B. & D. I. Stern, 2019. Lag length selection and p-hacking in Granger causality testing: prevalence and performance of meta-regression models. Empirical Economics, 56, 797-830.
  • Casini, A. & P. Perron, 2018. Structural breaks in time series. Forthcoming in Oxford Research Encyclopedia in Economics and Finance. 
  • Hendry, D. F. & K. Juselius, 1999. Explaining cointegration analysis: Pat I. Mimeo., Nuffield College, University of Oxford.
  • Hendry, D. F. & K. Juselius, 2000. Explaining cointegration analysis: Part II. Mimeo., Nuffield College, University of Oxford.
  • Horowitz, J., 2018. Bootstrap methods in econometrics. Cemmap Working Paper CWP53/18. 
  • Marmer, V., 2017. Econometrics with weak instruments: Consequences, detection, and solutions. Mimeo., Vancouver School of Economics, University of British Columbia.

© 2019, David E. Giles

Sunday, February 10, 2019

A Terrific New Book on the Linear Model

Recently, it was my distinct pleasure to review a first-class book by David Harville, titled Linear Models and the Relevant Distributions and Matrix Algebra.

(Added 28 February, 2019: You can now read the published review in Statistical Papers, here.)

Here is what I had to say:

Tuesday, February 5, 2019

Misinterpreting Tests, P-Values, Confidence Intervals & Power

There are so many things in statistics (and hence in econometrics) that are easily, and frequently, misinterpreted. Two really obvious examples are p-values and confidence intervals.

I've devoted some space in earlier posts to each of these concepts, and their mis-use. For instance, in the case of p-values, see the posts here and here; and for confidence intervals, see here and here.

Today I was reading a great paper by Greenland et al. (2016) that deals with some common misconceptions and misinterpretations that arise not only with p-values and confidence intervals, but also with statistical tests in general and the "power" of such tests. These comments by the authors in the abstract for their paper sets the tone of what's to follow rather nicely:
"A key problem is that there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof. Instead, correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists. This high cognitive demand has led to an epidemic of shortcut definitions and interpretations that are simply wrong, sometimes disastrously so - and yet these misinterpretations dominate much of the scientific literature." 
The paper then goes through various common interpretations of the four concepts in question, and systematically demolishes them!

The paper is extremely readable and informative. Every econometrics student, and most applied econometricians, would benefit from taking a look!


Reference

Greenland, S., S. J. Senn, K. R. Rothman, J. B. Carlin, C. Poole, S. N. Goodman, & D. G. Altman, 2016. Statistical tests, p values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31, 337-350.  

© 2019, David E. Giles

Sunday, February 3, 2019

February Reading

Now that Groundhog Day is behind us, perhaps we can focus on catching up on our reading?
  • Deboulets, L. D. D., 2018. A review on variable selection in regression. Econometrics, 6(4), 45.
  • Efron, B. & C. Morris, 1977. Stein's paradox in statistics. Scientific American, 236(5), 119-127.
  • Khan, W. M. & A. u I. Khan, 2018. Most stringent test of independence for time series. Communications in Statistics - Simulation and Computation, online.
  • Pedroni, P., 2018. Panel cointegration techniques and open challenges. Forthcoming in Panel Data Econometrics, Vol. 1: Theory, Elsevier.
  • Steel, M. F., J., 2018. Model averaging and its use in economics. MPRA Paper No. 90110.
  • Tay, A. S. & K. F. Wallis, 2000. Density forecasting: A survey. Journal of Forecasting, 19, 235-254.
© 2019, David E. Giles

Sunday, January 13, 2019

Machine Learning & Econometrics

What is Machine Learning (ML), and how does it differ from Statistics (and hence, implicitly, from Econometrics)?

Those are big questions, but I think that they're ones that econometricians should be thinking about. And if I were starting out in Econometrics today, I'd take a long, hard look at what's going on in ML.

Here's a very rough answer - it comes from a post by Larry Wasserman on his (now defunct) blog, Normal Deviate:
"The short answer is: None. They are both concerned with the same question: how do we learn from data?
But a more nuanced view reveals that there are differences due to historical and sociological reasons.......... 
If I had to summarize the main difference between the two fields I would say: 
Statistics emphasizes formal statistical inference (confidence intervals, hypothesis tests, optimal estimators) in low dimensional problems. 
Machine Learning emphasizes high dimensional prediction problems. 
But this is a gross over-simplification. Perhaps it is better to list some topics that receive more attention from one field rather than the other. For example: 
Statistics: survival analysis, spatial analysis, multiple testing, minimax theory, deconvolution, semiparametric inference, bootstrapping, time series.
Machine Learning: online learning, semisupervised learning, manifold learning, active learning, boosting. 
But the differences become blurrier all the time........ 
There are also differences in terminology. Here are some examples:
Statistics       Machine Learning
———————————–
Estimation        Learning
Classifier          Hypothesis
Data point         Example/Instance
Regression        Supervised Learning
Classification    Supervised Learning
Covariate          Feature
Response          Label 
Overall, the the two fields are blending together more and more and I think this is a good thing."
As I said, this is only a rough answer - and it's by no means a comprehensive one.

For an econometrician's perspective on all of this you can't do better that to take a look at Frank Dielbold's blog, No Hesitations. If you follow up on his posts with the label "Machine Learning" - and I suggest that you do - then you'll find 36 of them (at the time of writing).

If (legitimately) free books are your thing, then you'll find some great suggestions for reading more about the Machine Learning / Data Science field(s) on the KDnuggets website - specifically, here in 2017 and here in 2018.

Finally, I was pleased that the recent ASSA Meetings (ASSA2019) included an important contribution by Susan Athey (Stanford), titled "The Impact of Machine Learning on Econometrics and Economics". The title page for Susan's presentation contains three important links to other papers and a webcast.

Have fun!

© 2019, David E. Giles

Friday, January 11, 2019

Shout-out for Mischa Fisher

One of my former grad. students, Mischa Fisher, is currently Chief Economist and Advisor to the Governor of the State of Illinois. In this role he has oversight of a number of State agencies dealing with economics and data science.

This week, he had a really nice post on the Datascience.com blog. It's titled "10 Data Science Pitfalls to Avoid".

Mischa is very knowledgeable, and he writes extremely well. I strongly recommend that you take a look at his piece.

© 2019, David E. Giles