Sunday, July 15, 2018

Handbook of Quantile Regression

Quantile regression is a powerful and flexible technique that is widely used by econometricians and other applied statisticians. In modern terms we tend to date it back to the classic paper by Koenker and Bassett (1978).

Recently, I reviewed the Handbook of Quantile Regression. This edited volume comprises a number of important, original, contributions to the quantile regression literature. The various chapters cover a wide range of topics that extend the basic quantile regression set-up.

You can read my review of this book (Giles, 2018), here. I hope that it motivates you to explore this topic further.

Giles, D. E., 2018. Review of Handbook of Quantile Regression. Statistical Papers, 59, 849-850. 

Koenker, R., 2005. Quantile Regression. Cambridge University Press, Cambridge.

Koenker, R. and G. W. Bassett, 1978. Regression quantiles. Econometrica, 46, 33-50.

Koenker, R., V. Chernozhukov, H. Huming, & L. Peng (eds.), 2017. Handbook of Quantile Regression. Chapman & Hall/CRC, Boca Raton, FL.

© 2018, David E. Giles

Saturday, July 14, 2018

What's in a Journal Name?

Back in 2011 I put together a very light-hearted working paper titled, What's in a (Journal) Name? Here's the associated link.

That paper addressed the (obviously) important question: "Is there a a correlation between the ranking of an economics journal and the length of the journal's title?"

I analyzed a sample of 159 academic economics journals. Although there was no significant association between journal quality and journal title length for the full sample of data, I did find that there was a significant “bathtub” relationship between these variables when the data were subjected to a rank correlation analysis over sub-samples. 

This led me to conclude (p.5),among other things:
'This “bathtub” relationship will undoubtedly sound alarm bells in the corridors of publishing houses as they assess proposals for new economics journals. The title, Economics, is no longer available, having been cunningly snapped up in recent years by an open-access, open-assessment e-journal which managed to “cover all of the bases” in one fell swoop. Even more recently the American Economics Association laid claim to the titles Macroeconomics and Microeconomics, albeit with an “AEA” prefix that they may wish to re-consider. The publishers of the journal, SERIEs: Journal of the Spanish Economic Association, which was launched in 2010, will no doubt ponder the merits of dropping the last six words of its title. However, there is hope. The title Econometrica has been spoken for since 1933, but to the best of our knowledge the more worldly journal title Econometrics is still available. Publishers should register their interest forthwith!'
As usual the latter remark proved to be safe advice on my part! I wonder if my subsequent invitation to join the Editorial Board of Econometrics was some sort of reward? 

I'll probably never know!

© 2018, David E. Giles

Friday, July 13, 2018

More on Regression Coefficient Interpretation

I get a lot of direct email requests from people wanting help/guidance/advice of various sorts about some aspect of econometrics or other. I like being able to help when I can, but these requests can lead to some pitfalls -  for both of us.

More on that in a moment. Meantime, today I got a question from a Ph.D student, "J", which was essentially the following:

" Suppose I have the following regression model

             log(yi) = α + βXi + εi    ;  i = 1, 2, ...., n .

How do interpret the (estimated) value of β?"

I think most of you will know that the answer is:

"If X changes by one unit, then y changes by (100*β)%".

If you didn't know this, then some trivial partial differentiation will confirm it. And after all, isn't partial differentiation something that grad. students in ECON should be good at?


      β = [∂log(yi) / ∂Xi] = [∂logyi / ∂yi][∂yi∂Xi] = [∂yi  ∂Xi] / yi,

which is the proportional change in y for a unit change in X. Multiplying by 100 puts the answer into percentage terms.

So, I responded to "J" accordingly.

So far, so good.

But then I got a response:

"Actually, my model includes an interaction term, and really it looks like this:

    log(yi) = α + βXi + γ [XiΔlog(Zi)] + εi    ;  i = 1, 2, ...., n.

How do I interpret β?"

Whoa! That's not the question that was first asked - and now my previous answer (given in good faith) is totally wrong! 

Let's do some partial differentiation again, with this full model. We still have:

[∂log(yi) / ∂Xi] = [∂logyi / ∂yi][∂yi / ∂Xi] = [∂yi  ∂Xi] / yi.

However, this expression now equals [β γ Δlog(Zi)].

So, a one unit change in X leads to a percentage change in y that's equal to 100*[β γ Δlog(Zi)]%.

This percentage change is no longer constant - it varies as Z takes on different sample values. If you wanted to report a single value you could evaluate the expression using the estimates for β and γ, and either the sample average, or sample median, value for Δlog(Z).

This illustrates one of the difficulties that I face sometimes. I try to respond to a question, but I really don't know if the question being asked is the appropriate one; or if it's been taken out of context; or if the information I'm given is complete or not.

If you're a grad. student, then discussing your question in person with your supervisor should be your first step!

© 2018, David E. Giles

Friday, July 6, 2018

Interpreting Dummy Variable Coefficients After Non-Linear Transformations

Dummy variables - ones that take only the values zero and one - are commonly used as regressors in regression models. I've devoted several posts to discussing various aspects of such variables, notably here, but also here, here, and here.

When the regression model in question is linear, in both the variables and the parameters, the interpretation of coefficient of such a dummy variable is simple. Suppose that the model takes the form:

    yi = α + β Di + Σj γj Xji + ε    ;     E(ε) = 0   ;   i = 1, ...., n.                          (1)

The range of summation in the term on the right-hand side of (1) is from 1 to k, if there are k regressors in addition to the dummy variable, D. (There is no loss of generality in assuming a single dummy regressor in what follows, and no further distributional assumptions about the error term will be needed or used.)

As you'll know, if Di = 0, then the intercept coefficient in (1) is just α; and it shifts to (α + β) if Di = 1. It changes by an amount equal to β, and so does the predicted mean value of y. Conversely, this amount changes by -β  if Di changes from 1 to 0. Estimating (1) by OLS will give us an estimate of the effect on y of Di sw from 0 to 1 in value, or vice versa.

But a bit more on estimation issues below!

Another way of interpreting what is going on is to think about the growth rate in the expected value of y that is implied when D changes its value. Setting Di = 0, and then Di = 1, this growth rate is:

   g01i = [ (α + β + Σj γj Xji) - (α Σj γj Xji)] / (α Σj γj Xji) = [β /  (α Σj γj Xji)] ,

which you can multiply by 100 to convert it into a percentage rate of growth, if you wish. 

Note that this growth rate depends on the other parameters in the model, and also on the sample values for the other regressors

Conversely, when D changes in value from 1 to 0, this growth rate is different, namely:

   g10i = - [β / (α + β + Σj γj Xji)]                            (i = 1, ...., n).

In this fully linear model these growth rates offer a somewhat less appealing way of summarizing what is going on than does the amount of change in the expected value of y. The latter doesn't depend on the other parameters of the model, or on the sample values of the regressors.

However, this situation can change very quickly once we move to a regression model that is non-linear, either in the variables or in the parameters (or both). 

That's what I want to focus on in this post. 

Let's consider some interesting examples that involve common transformations of the dependent variable in a regression model. Apart from anything else, such transformations are often undertaken to make the assumption of a normally distributed error term more reasonable.

Monday, July 2, 2018

Some Reading Suggestions for July

Some summertime reading:
  • Chen, T., DeJuan, J., & R. Tian, 2018. Distributions of GDP across versions of  the Penn World Tables: A functional data analysis approach. Economics Letters, in press. 
  • Clements, K.W., H. Liu, & Y. Tarverdi, 2018. Alcohol consumption, censorship and misjudgment. Applied Economics, online
  • Jin, H., S. Zhang, J. Zhang,& H. Hao, 2018. Modified tests for change points in variance in the possible presence of mean breaks. Journal of Statistical Computation and Simulation, online
  • Pata, U.K., 2018. The Feldstein Horioka puzzle in E7 countries: Evidence from panel cointegration and asymmetric causality analysis. Journal of International Trade and Economic Development, online.
  • Sen, A., 2018. A simple unit root testing methodology that does not require knowledge regarding the presence of a break. Communications in Statistics - Simulation and Computation, 47, 871-889.
  • Wright, T., M. Klein, &K. Wieczorek, 2018. A primer on visualizations for comparing populations, including the issue of overlapping confidence intervals. American Statistician, online.

© 2018, David E. Giles

Sunday, July 1, 2018

Dummy Variables in a Semilogarithmic Regression: Exact Distributional Results

For better or worse, semilogarithmic regression models are used a lot in empirical economics. 

It would be nice to think that this is because the researcher found that a logarithmic transformation of the model's dependent variable led to residuals that were more "normally" distributed than without the transformation. Unfortunately, however, it's often just "for convenience". With this transformation, the estimates of the regression coefficients have a simple interpretation, as explained below

I hate it when the latter situation arises. I've long since lost track of the number of times I've been at a seminar where the speaker has used this "simple interpretation" as an excuse for their choice of a semilogarithmic regression specification. For goodness sake, the choice of the model's functional form should be based on more than "convenience"!

For some of my previous comments about this point, see this post.

Most of you will know that when our semilogarithmic model includes a dummy (zero-one) regressor, we have to be careful about how we interpret that regressor's estimated coefficient. Suppose that we have the following regression model, where D is a dummy variable, and the X's are regresssors that are measured "continuously"

   ln(yi) = α + β Di + Σj γj Xji + ε    ;     E(ε) = 0   ;   i = 1, ...., n.                         

Note that there's no loss of generality here in having just one dummy variable in the model.

Then, the interpretation of the regression coefficients is:
  1. A one-unit change in Xj leads to a proportional change of  γj (or a percentage change of 100γj) in y.
  2. When the dummy variable changes from D = 0 to D = 1, the proportional change in y is [exp(β) -1]. Conversely, going from D = 1 to D = 0 implies a proportional change in y of  [exp(-β) -1]. Again, multiply by 100 to get a percentage change.
See Halvorsen and Palmquist (1980) for an explanation of the second of these results, and my comments in this earlier post.

Kennedy (1981) and Giles (1982) discuss the issue of estimating this proportional change in the case of the dummy variable. Their results relate to point estimation - with a focus on unbiased estimation of the proportional change, when the model's errors are normally distributed..

But what about interval estimation of this effect?