Sunday, July 15, 2018

Handbook of Quantile Regression

Quantile regression is a powerful and flexible technique that is widely used by econometricians and other applied statisticians. In modern terms we tend to date it back to the classic paper by Koenker and Bassett (1978).

Recently, I reviewed the Handbook of Quantile Regression. This edited volume comprises a number of important, original, contributions to the quantile regression literature. The various chapters cover a wide range of topics that extend the basic quantile regression set-up.

You can read my review of this book (Giles, 2018), here. I hope that it motivates you to explore this topic further.

Giles, D. E., 2018. Review of Handbook of Quantile Regression. Statistical Papers, 59, 849-850. 

Koenker, R., 2005. Quantile Regression. Cambridge University Press, Cambridge.

Koenker, R. and G. W. Bassett, 1978. Regression quantiles. Econometrica, 46, 33-50.

Koenker, R., V. Chernozhukov, H. Huming, & L. Peng (eds.), 2017. Handbook of Quantile Regression. Chapman & Hall/CRC, Boca Raton, FL.

© 2018, David E. Giles

Saturday, July 14, 2018

What's in a Journal Name?

Back in 2011 I put together a very light-hearted working paper titled, What's in a (Journal) Name? Here's the associated link.

That paper addressed the (obviously) important question: "Is there a a correlation between the ranking of an economics journal and the length of the journal's title?"

I analyzed a sample of 159 academic economics journals. Although there was no significant association between journal quality and journal title length for the full sample of data, I did find that there was a significant “bathtub” relationship between these variables when the data were subjected to a rank correlation analysis over sub-samples. 

This led me to conclude (p.5),among other things:
'This “bathtub” relationship will undoubtedly sound alarm bells in the corridors of publishing houses as they assess proposals for new economics journals. The title, Economics, is no longer available, having been cunningly snapped up in recent years by an open-access, open-assessment e-journal which managed to “cover all of the bases” in one fell swoop. Even more recently the American Economics Association laid claim to the titles Macroeconomics and Microeconomics, albeit with an “AEA” prefix that they may wish to re-consider. The publishers of the journal, SERIEs: Journal of the Spanish Economic Association, which was launched in 2010, will no doubt ponder the merits of dropping the last six words of its title. However, there is hope. The title Econometrica has been spoken for since 1933, but to the best of our knowledge the more worldly journal title Econometrics is still available. Publishers should register their interest forthwith!'
As usual the latter remark proved to be safe advice on my part! I wonder if my subsequent invitation to join the Editorial Board of Econometrics was some sort of reward? 

I'll probably never know!

© 2018, David E. Giles

Friday, July 13, 2018

More on Regression Coefficient Interpretation

I get a lot of direct email requests from people wanting help/guidance/advice of various sorts about some aspect of econometrics or other. I like being able to help when I can, but these requests can lead to some pitfalls -  for both of us.

More on that in a moment. Meantime, today I got a question from a Ph.D student, "J", which was essentially the following:

" Suppose I have the following regression model

             log(yi) = α + βXi + εi    ;  i = 1, 2, ...., n .

How do interpret the (estimated) value of β?"

I think most of you will know that the answer is:

"If X changes by one unit, then y changes by (100*β)%".

If you didn't know this, then some trivial partial differentiation will confirm it. And after all, isn't partial differentiation something that grad. students in ECON should be good at?


      β = [∂log(yi) / ∂Xi] = [∂logyi / ∂yi][∂yi∂Xi] = [∂yi  ∂Xi] / yi,

which is the proportional change in y for a unit change in X. Multiplying by 100 puts the answer into percentage terms.

So, I responded to "J" accordingly.

So far, so good.

But then I got a response:

"Actually, my model includes an interaction term, and really it looks like this:

    log(yi) = α + βXi + γ [XiΔlog(Zi)] + εi    ;  i = 1, 2, ...., n.

How do I interpret β?"

Whoa! That's not the question that was first asked - and now my previous answer (given in good faith) is totally wrong! 

Let's do some partial differentiation again, with this full model. We still have:

[∂log(yi) / ∂Xi] = [∂logyi / ∂yi][∂yi / ∂Xi] = [∂yi  ∂Xi] / yi.

However, this expression now equals [β γ Δlog(Zi)].

So, a one unit change in X leads to a percentage change in y that's equal to 100*[β γ Δlog(Zi)]%.

This percentage change is no longer constant - it varies as Z takes on different sample values. If you wanted to report a single value you could evaluate the expression using the estimates for β and γ, and either the sample average, or sample median, value for Δlog(Z).

This illustrates one of the difficulties that I face sometimes. I try to respond to a question, but I really don't know if the question being asked is the appropriate one; or if it's been taken out of context; or if the information I'm given is complete or not.

If you're a grad. student, then discussing your question in person with your supervisor should be your first step!

© 2018, David E. Giles

Friday, July 6, 2018

Interpreting Dummy Variable Coefficients After Non-Linear Transformations

Dummy variables - ones that take only the values zero and one - are commonly used as regressors in regression models. I've devoted several posts to discussing various aspects of such variables, notably here, but also here, here, and here.

When the regression model in question is linear, in both the variables and the parameters, the interpretation of coefficient of such a dummy variable is simple. Suppose that the model takes the form:

    yi = α + β Di + Σj γj Xji + ε    ;     E(ε) = 0   ;   i = 1, ...., n.                          (1)

The range of summation in the term on the right-hand side of (1) is from 1 to k, if there are k regressors in addition to the dummy variable, D. (There is no loss of generality in assuming a single dummy regressor in what follows, and no further distributional assumptions about the error term will be needed or used.)

As you'll know, if Di = 0, then the intercept coefficient in (1) is just α; and it shifts to (α + β) if Di = 1. It changes by an amount equal to β, and so does the predicted mean value of y. Conversely, this amount changes by -β  if Di changes from 1 to 0. Estimating (1) by OLS will give us an estimate of the effect on y of Di sw from 0 to 1 in value, or vice versa.

But a bit more on estimation issues below!

Another way of interpreting what is going on is to think about the growth rate in the expected value of y that is implied when D changes its value. Setting Di = 0, and then Di = 1, this growth rate is:

   g01i = [ (α + β + Σj γj Xji) - (α Σj γj Xji)] / (α Σj γj Xji) = [β /  (α Σj γj Xji)] ,

which you can multiply by 100 to convert it into a percentage rate of growth, if you wish. 

Note that this growth rate depends on the other parameters in the model, and also on the sample values for the other regressors

Conversely, when D changes in value from 1 to 0, this growth rate is different, namely:

   g10i = - [β / (α + β + Σj γj Xji)]                            (i = 1, ...., n).

In this fully linear model these growth rates offer a somewhat less appealing way of summarizing what is going on than does the amount of change in the expected value of y. The latter doesn't depend on the other parameters of the model, or on the sample values of the regressors.

However, this situation can change very quickly once we move to a regression model that is non-linear, either in the variables or in the parameters (or both). 

That's what I want to focus on in this post. 

Let's consider some interesting examples that involve common transformations of the dependent variable in a regression model. Apart from anything else, such transformations are often undertaken to make the assumption of a normally distributed error term more reasonable.

Monday, July 2, 2018

Some Reading Suggestions for July

Some summertime reading:
  • Chen, T., DeJuan, J., & R. Tian, 2018. Distributions of GDP across versions of  the Penn World Tables: A functional data analysis approach. Economics Letters, in press. 
  • Clements, K.W., H. Liu, & Y. Tarverdi, 2018. Alcohol consumption, censorship and misjudgment. Applied Economics, online
  • Jin, H., S. Zhang, J. Zhang,& H. Hao, 2018. Modified tests for change points in variance in the possible presence of mean breaks. Journal of Statistical Computation and Simulation, online
  • Pata, U.K., 2018. The Feldstein Horioka puzzle in E7 countries: Evidence from panel cointegration and asymmetric causality analysis. Journal of International Trade and Economic Development, online.
  • Sen, A., 2018. A simple unit root testing methodology that does not require knowledge regarding the presence of a break. Communications in Statistics - Simulation and Computation, 47, 871-889.
  • Wright, T., M. Klein, &K. Wieczorek, 2018. A primer on visualizations for comparing populations, including the issue of overlapping confidence intervals. American Statistician, online.

© 2018, David E. Giles

Sunday, July 1, 2018

Dummy Variables in a Semilogarithmic Regression: Exact Distributional Results

For better or worse, semilogarithmic regression models are used a lot in empirical economics. 

It would be nice to think that this is because the researcher found that a logarithmic transformation of the model's dependent variable led to residuals that were more "normally" distributed than without the transformation. Unfortunately, however, it's often just "for convenience". With this transformation, the estimates of the regression coefficients have a simple interpretation, as explained below

I hate it when the latter situation arises. I've long since lost track of the number of times I've been at a seminar where the speaker has used this "simple interpretation" as an excuse for their choice of a semilogarithmic regression specification. For goodness sake, the choice of the model's functional form should be based on more than "convenience"!

For some of my previous comments about this point, see this post.

Most of you will know that when our semilogarithmic model includes a dummy (zero-one) regressor, we have to be careful about how we interpret that regressor's estimated coefficient. Suppose that we have the following regression model, where D is a dummy variable, and the X's are regresssors that are measured "continuously"

   ln(yi) = α + β Di + Σj γj Xji + ε    ;     E(ε) = 0   ;   i = 1, ...., n.                         

Note that there's no loss of generality here in having just one dummy variable in the model.

Then, the interpretation of the regression coefficients is:
  1. A one-unit change in Xj leads to a proportional change of  γj (or a percentage change of 100γj) in y.
  2. When the dummy variable changes from D = 0 to D = 1, the proportional change in y is [exp(β) -1]. Conversely, going from D = 1 to D = 0 implies a proportional change in y of  [exp(-β) -1]. Again, multiply by 100 to get a percentage change.
See Halvorsen and Palmquist (1980) for an explanation of the second of these results, and my comments in this earlier post.

Kennedy (1981) and Giles (1982) discuss the issue of estimating this proportional change in the case of the dummy variable. Their results relate to point estimation - with a focus on unbiased estimation of the proportional change, when the model's errors are normally distributed..

But what about interval estimation of this effect? 

Tuesday, June 19, 2018

Shout-Out for Marc Bellemare

If you don't follow Marc Bellemare's blog (shame on you - you should!), then you may not have caught up with his recent posts relating to his series of lectures on "Advanced Econometrics - Causal Inference With Observational Data" at the University of Copenhagen in May of this year.

Marc is keeping us all on tenterhooks by "releasing" the slides for these lectures progressively - smart move!

So far, the first five of the eight lectures in the series are now available for downloading:
I'm looking forward to seeing the rest of these terrific lectures.

Thanks for sharing them, Marc.

© 2018, David E. Giles

Wednesday, June 6, 2018

The Series of Unsurprising Results in Economics (SURE)

Andrea Menclover of the University of Canterbury (New Zealand) has recently founded the SURE Journal, whose aims and scope are as follows:

'The Series of Unsurprising Results in Economics (SURE) is an e-journal of high-quality research with “unsurprising” findings. We publish scientifically important and carefully-executed studies with statistically insignificant or otherwise unsurprising results. Studies from all fields of Economics will be considered. SURE is an open-access journal and there are no submission charges. (My emphasis, DG.)

SURE benefits readers by:
  • Mitigating the publication bias and thus complementing other journals in an effort to provide a complete account of the state of affairs;
  • Serving as a repository of potential (and tentative) “dead ends” in Economics research.

SURE benefits writers by:
  • Providing an outlet for interesting, high-quality, but “risky” (in terms of uncertain results) research projects;
  • Decreasing incentives to data-mine, change theories and hypotheses ex post or exclusively focus on provocative topics.'

To find out more or to submit a manuscript, visit:

This is a novel venture that has a lot to offer at a time when research replicability and publication bias are (rightly) receiving so much attention.

I'm delighted to be associated with the new journal as a member of its Editorial Board.

© 2018, David E. Giles

Friday, June 1, 2018

Suggested Reading for June

© 2018, David E. Giles

Thursday, May 31, 2018

The Uniqueness of the Cointegrating Vector

Suppose that we have (only) two non-stationary time-series, X1t and X2t (t = 1, 2, 3, .....). More specifically, suppose that both of these series are integrated of order one (i.e., I(1)). Then there are two possibilities - either X1 and X2 are cointegrated, or they aren't.

You'll recall that if they are cointegrated, then there is a linear combination of X1 and X2 that is stationary. Let's write this linear combination as Zt = (X1t + αX2t). (We can normalize the first "weight" to the value "one" without any loss of generality.) The vector whose elements are 1 and α is the so-called "cointegrating vector".

You may be aware that if such a vector exists, then it is unique.

Recently, I was asked for a simple proof of this uniqueness. Here goes.........

Thursday, April 26, 2018

Results of the Econometric Game, 2018

In a recent post I mentioned the 2018 "edition" of The Econometric Game, which was held in Amsterdam earlier this month.

In random order, the finalists, after the first two days' of competition, were the teams representing:

Aarhus University
Erasmus Universiteit Rotterdam
Harvard University
Lund University
McGill University
Universiteit van Tilburg
Universiteit van Amsterdam
University Carlos III Madrid
University of Bristol
University of Toronto

These teams then competed in a further one-day event..

The team from University Carlos III Madrid emerged the winner; with those from Harvard University and Aarhus University taking second and third places respectively.

The organizers of The Game have provided a gallery of photos. here   

Congratulations to all involved for another impressive event!

© 2018, David E. Giles

Wednesday, April 25, 2018

April Reading

Very belatedly, here is my list of suggested reading for April:
  • Biørn, E., 2017. Identification, instruments, omitted variables, and rudimentary models: Fallacies in the "experimental approach" to econometrics. Memorandum No. 13/2017, Department of Economics, Oslo University.
  • Chambers, M. J., and M. Kyriacou, 2018. Jackknife bias reduction in the presence of a near-unit root. Econometrics, 6, 11.
  • Derryberry, D., K. Aho, J. Edwards, and T. Peterson, 2018. Model selection and regression t-statistics. American Statistician, in press.
  • Mitchell, J., D. Robertson, and S. Wright, 2018. R2 bounds for predictive models: What univariate properties tell us about multivariate predictability. Journal of Business and Economic Statistics, in press. (Free download here.)
  • Parker, T., 2017. Finite-sample distributions of the Wald, likelihood ratio, and Lagrange multiplier test statistics in the classical linear model. Communications in Statistics - Theory and Methods, 46, 5195-5202.
  • Troster, V., 2018. Testing Granger-causality in quantiles. Econometric Reviews, 37, 850-866.

© 2018, David E. Giles

Monday, March 19, 2018

The (Undergraduate) (Econo) Metrics Game

In a comment on my recent post about the long-running Econometrics Game for graduate student teams, "BJH" kindly pointed out the existence of a counterpart for undergraduate econometrics students.

The "Metrics Game" is a two-day competition organised by OEconomica in association with the University of Chicago’s Department of Economics and the Becker Friedman Institute. 

The 2018 competition is the fourth in the series, and gets underway on 7 April at the University of Chicago.

It's great to see competitions of this type being made available for students at all levels of study.

© 2018, David E. Giles

Sunday, March 18, 2018

The Econometric Game, 2018

Readers of this blog will be familiar with The Econometric Game. You'll find my posts about the 2016 and 2017 Games here, and here the first of those posts links to ones about the Games from previous years.

The Econometric Game is a competition between teams of graduate students in econometrics. It's organised by the study association for Actuarial Science, Econometrics & Operational Research (VSAE) of the University of Amsterdam, and it has been a terrific success.

The Econometric Game has been held annually since 1999. This year, 30 teams have been chosen to compete in the Games, which will be held in Amsterdam from 11 to 13 of April. The theme for this year's competition is "Econometrics of Happiness".

The winners in both 2016 and 2017 were teams representing Harvard University. Let's see how they perform this year. I'll have some follow-up posts once the Game gets underway next month.

© 2018, David E. Giles

Wednesday, February 21, 2018

March Reading List

  • Annen, K. & S. Kosempel, 2018. Why aid-to-GDP ratios? Discussion Paper 2018-01, Department of Economics and Finance, University of Guelph.
  • Conover, W. J., A. J. Guerrero-Serrano, & V. G. Tercero-Gomez, 2018. An update on 'a comparative study of tests for homogeneity of variance'. Journal of Statistical Computation and Simulation, online.
  • Foroni, C., M. Marcellino, & D. Stevanović, 2018. Mixed frequency models with MA components. Discussion Paper  No. 02/2018, Deutsche Bundesbank.
  • Sen, A., 2018. Lagrange multiplier unit root test in the presence of a break in the innovation variance. Communications in Statistics - Theory and Methods, 47, 1580-1596.
  • Stewart, K. G., 2018. Suits' watermelon model: The missing simultaneous equations empirical example. Mimeo., Department of Economics, University of Victoria.
  • Weigt, T. & B. Wilfling, 2018. An approach to increasing forecast-combination accuracy through VAR error modeling. Paper 68/2018, Department of Economics, University of Münster.
© 2018, David E. Giles

Sunday, February 11, 2018

Recommended Reading for February

Here are some reading suggestions:
  • Bruns, S. B., Z. Csereklyei, & D. I. Stern, 2018. A multicointegration model of global climate change. Discussion Paper No. 336, Center for European, Governance and Economic Development Research, University of Goettingen.
  • Catania, L. & S. Grassi, 2017. Modelling crypto-currencies financial time-series. CEIS Tor Vegata, Research Paper Series, Vol. 15, Issue 8, No. 417.
  • Farbmacher, H., R. Guber, & J. Vikström, 2018. Increasing the credibility of the twin birth instrument. Journal of Applied Econometrics, online.
  • Liao, J. G. & A. Berg, 2018. Sharpening Jensen's inequality. American Statistician, online.
  • Reschenhofer, E., 2018. Heteroscedasticity-robust estimation of autocorrelation. Communications in Statistics - Simulation and Computation, online.

© 2018, David E. Giles

Saturday, February 10, 2018

Economic Goodness-of-Fit

What do we mean by a "significant result" in econometrics?

The distinction between "statistical significance" and "economic significance" has received a good deal of attention in the literature. And rightly so.

Think about the estimated coefficients in a regression model, for example. Putting aside the important issue of the choice of a significance level when considering statistical significance, we all know that results that are significant in the latter sense may or may not be 'significant' when their economic impact is considered.

Marc Bellemare provided a great discussion of this in his blog a while back.

Here, I want to draw attention to a somewhat related issue - distinguishing between the statistical and economic overall goodness-of-fit of an economic model.

Thursday, February 8, 2018

ASA Symposium on Statistical Inference - Recorded Sessions

In October of last year, the American Statistical Association held a two-day Symposium on Statistical Inference in Bethesda, MD.

The symposium was sub-titled, Scientific Method for the 21st. Century: A World Beyond p < 0.05. That gives you some idea of what it was about.

The ASA has now released video recordings of several of the sessions at the symposium, and you can find them here.

The video sessions include:

"Why Is Eliminating P-Values So Hard? Reflections on Science and Statistics." (Steve Goodman)

"What Have We (Not) Learnt from Millions of Scientific Papers with P-Values?" (John Ioannidis)

"Understanding the Needs for Statistical Evidence of Decision-Makers in Medicine." (Madhu Mazumdar, Keren Osman, & Elizabeth Garrett-Mayer) 

"Statisticians: Sex Symbols, Liars, Both, or Neither?" (Christie Aschwanden, Laura Helmuth, & Aviva Hope Rutkin) 

"The Radical Prescription for Change." (Andrew Gelman, Marcia McNutt, & Xiao-Li Meng)

Closing Session: “Take the Mic”

The videos are stimulating and timely. I hope that you enjoy them.

© 2018, David E. Giles

Saturday, February 3, 2018

Bayesian Econometrics Slides

Over the years, I included material on Bayesian Econometrics in various courses that I taught - especially at the grad. level. I retired from teaching last year, and I thought that some of you might be interested in the slides that I used when I taught a Bayesian Econometrics topic for the last time.

I hope that you find them useful.

1. General Background
2. Constructing Prior Distributions
3. Properties of Bayes Estimators and Tests
4. Bayesian Inference for the Linear Regression Model
5. Bayesian Computation
6. More Bayesian Computation 
7. Acceptance-Rejection Sampling
8. The Metropolis-Hastings Algorithm
9. Model Selection - Theory
10. Model Selection - Applications
11. Consumption Function Case Study
© 2018, David E. Giles

Tuesday, January 2, 2018

Econometrics Reading for the New Year

Another year, and lots of exciting reading!
  • Davidson, R. & V. Zinde-Walsh, 2017. Advances in specification testing. Canadian Journal of Economics, online.
  • Dias, G. F. & G. Kapetanios, 2018. Estimation and forecasting in vector autoregressive moving average models for rich datasets. Journal of Econometrics, 202, 75-91.  
  • González-Estrada, E. & J. A. Villaseñor, 2017. An R package for testing goodness of fit: goft. Journal of Statistical Computation and Simulation, 88, 726-751.
  • Hajria, R. B., S. Khardani, & H. Raïssi, 2017. Testing the lag length of vector autoregressive models:  A power comparison between portmanteau and Lagrange multiplier tests. Working Paper 2017-03, Escuela de Negocios y EconomÍa. Pontificia Universidad Católica de ValaparaÍso.
  • McNown, R., C. Y. Sam, & S. K. Goh, 2018. Bootstrapping the autoregressive distributed lag test for cointegration. Applied Economics, 50, 1509-1521.
  • Pesaran, M. H. & R. P. Smith, 2017. Posterior means and precisions of the coefficients in linear models with highly collinear regressors. Working Paper BCAM 1707, Birkbeck, University of London.
  • Yavuz, F. V. & M. D. Ward, 2017. Fostering undergraduate data science. American Statistician, online. 

© 2018, David E. Giles

Monday, January 1, 2018

Interpolating Statistical Tables

We've all experienced it. You go to use a statistical table - Standard Normal, Student-t, F, Chi Square - and the line that you need simply isn't there in the table. That's to say the table simply isn't detailed enough for our purposes.

One question that always comes up when students are first being introduced to such tables is:
"Do I just interpolate linearly between the nearest entries on either side of the desired value?"
Not that these exact words are used, typically. For instance, a student might ask if they should take the average of the two closest values. How should you respond?