Econometrics Beat: Dave Giles' Blog

It's Time to Go

2019-10-31T08:42:00.003-07:00

When I released my first post on the blog on 20th. Febuary 2011 I really wasn't sure what to expect! After all, I was aiming to reach a somewhat niche audience.

Well, 949 posts and 7.4 million page-hits later, this blog has greatly exceeded my wildest expectations.

However, I'm now retired and I turned 70 three months ago. I've decided to call it quits, and this is my final post.

I'd rather make a definite decision about this than have the blog just fizzle into nothingness.

For now, the Econometrics Beat blog will remain visible, but it will be closed for further comments and questions.

I've had a lot fun and learned a great deal through this blog. I owe a debt of gratitude to all of you who've followed my posts, made suggestions, asked questions, made helpful comments, and drawn errors to my attention.

I just hope that it's been as positive an experience for you as it has been for me.

Thank you - and enjoy your Econometrics!

Everything's Significant When You Have Lots of Data

2019-10-30T20:48:00.003-07:00

Well........, not really!

It might seem that way on the face of it, but that's because you're probably using a totally inappropriate measure of what's (statistically) significant, and what's not.

I talked a bit about this issue in a previous post, where I said:

"Granger (1998, 2003) has reminded us that if the sample size is sufficiently large, then it's virtually impossible not to reject almost any hypothesis. So, if the sample is very large and the p-values associated with the estimated coefficients in a regression model are of the order of, say, 0.10 or even 0.05, then this really bad news. Much, much, smaller p-values are needed before we get all excited about 'statistically significant' results when the sample size is in the thousands, or even bigger."

This general point, namely that our chosen significance level should be decreased as the sample size grows, is pretty well understood by most statisticians and econometricians. (For example, see Good, 1982.) However, it's usually ignored by the authors of empirical economics studies based on samples of thousands (or more) observations. Moreover, a lot of practitioners seem to be unsure of just how much they should revise their significance levels (or re-interpret their p-values) in such circumstances.

There's really no excuse for this, because there are some well-established guidelines to help us. In fact, as we'll see, some of them have been around since at least the 1970's.

Let's take a quick look at this, because it's something that all students need to be made aware of as we work more and more with "big data". Students certainly won't gain this awareness by looking at the interpretation of the results in the vast majority of empirical economics papers that use even sort-of-large samples!

The main result that I want to highlight is one that was brought into the econometrics literature by Leamer (1978). (Take a look at Chapter 4 of his book, referenced below - and especially p.116.)

Let's set the scene by quoting from Deaton (2018, Chap. 2):

"The effect most noted by empirical researchers is that the null hypothesis seems to be more frequently rejected in large samples than in small. Since it is hard to believe that the truth depends on the sample size, something else must be going on.......... As the sample size increases, and provided we are using a consistent estimation procedure, our estimates will be closer and closer to the truth, and less dispersed around it, so that discrepancies that were undetectable with small samples will lead to rejections in large samples...........

Over-rejection in large samples can also be thought of in terms of Type I and Type II errors. When we hold Type I error fixed and increase the sample size, all the benefits of increased precision are implicitly devoted to the reduction of Type II error.........

Repairing these difficulties requires that the critical values of test statistics be raised with the sample size, so that the benefits of increased precision are more equally allocated between reduction Type I and Type II errors. That said, it is a good deal more difficult to decide exactly how to do so, and to derive the rule from basic principles. Since classical procedures cannot provide such a basis, Bayesian alternatives are the obvious place to look."

And that's precisely what Leamer does. Also, see Schwartz (1978).

Suppose that we have a linear multiple regression model with k regressors and n observations, and we want to test the null hypothesis that a set of q independent linear restrictions on the regression coefficients are satisfied. The alternative hypothesis is that at least one of the restrictions is violated. Under the very restrictive assumptions that we usually begin with in this context, an F-test would be used, and the associated statistics would be F-distributed with q and (n - k) degrees of freedom if the null is true.

We would reject the null hypothesis if F > F_c(α), where α is the chosen significance level, and F_c(α) is the associated critical value. Alternatively, we would calculate the p-value associated with the observed value of F, and reject if this p-value is "small enough".

We're interested in situations where n is large - probably very large. So, we can ignore the distinction between n and (n - k). Asymptotically, qF converges to a Chi-Square statistic with q degrees of freedom if the null is true. This is equivalent to the Wald test. Moreover, this convergence still holds if the restrictions under test are nonlinear, rather than linear. We would reject the null if qF > χ²_c(α), where again the "c" subscript denotes the appropriate critical value.

It's the choice of α that has to be questioned here. Should we still set α = 10%, 5%, 1% if n is very, very large? (No, we shouldn't!)

Equivalently, if n is very big, what is the appropriate magnitude of the p-value below which we should decide to reject the null hypothesis? Or, equivalently again, how should the critical value for this test be modified in very large samples?

Leamer's result tells us that we should reject the null if F > (n / q)(n^q/n - 1) ; or equivalently, if qF = χ² > n(n^q/n - 1)

It's important to note that this result is based on a Bayesian analysis with a particular approach to the diffuseness of the prior distribution.

Also, recall that if we have a t-statistic with v degrees of freedom, then t² is F-distributed with 1 and v degrees of freedom. So, if we are testing the significance of a single regressor (i.e., we are testing just one restriction), then Leamer's result tells us that we should reject the null that this coefficient is zero if t² > n(n^1/n - 1). That is, we should reject against a 2-sided alternative hypothesis if |t| > √[n(n^1/n - 1)] (Remember, q = 1 in this case.)

It's actually easy to check that n(n^1/n - 1) is approximately equal to log_e(n) for large values of n. Indeed, if n is very, very large, this approximation is still excellent even if q > 1 (as long as q is finite). Consider the following numerical examples:

Table 1

So, what this means is that for very big samples Leamer's rule amounts to the following:

Reject H₀: "q independent restrictions on the model's parameters are true", if F > log_e(n) ; or equivalently, if χ² > qlog_e(n).

How does this differ from what we'd do traditionally? (Remember, for large n we can ignore the distinction between n and (n - k).) Here are the corresponding critical F-values:

Table 2

We see that if n = 100,000 and q = 5, then using the F-test with conventional significance levels we'd reject the validity of the restrictions at the 10%, 5% and 1% significance levels if the F-statistic exceeded 1.8, 2.2, or 3.0 respectively. From Table 1, we see that the critical value in this case actually should be 11.5! You can quickly check for yourself that if we're applying a 2-sided t-test (q = 1), with n = 100,000, than we should reject the null hypothesis if |t| > √(11.5162) = 3.394.

So, using conventional measures, we'll reject the validity of the restrictions far more often than we should do. Youch!

To look at things from a different perspective we can ask, "what sort of significance levels are being implied by Leamer's suggestion, relative to the levels (10%, 5%, etc.) that we typically use in practice?"

Let's go back to Table 1, and focus on the last column of log_e(n) critical values. The associated significance levels are as follows:

Table 3

And in the case of the t-test example given beneath Table 2, the significance level associated with a critical value of 3.394 is 0.000345.

As we can see, when n is very large, the significance levels that we should be using (or, equivalently, the p-values that we should be using) are much less than the conventional levels that we tend to think of!

As an exercise, why don't you take a look back at one of your favourite applied econometrics papers that uses a very large sample size, and ask yourself, "do I really believe the conclusions that the author has reached?"

If you want to read more on this topic, I suggest that you take a look at Lin et al. (2013), and Lakens (2018).

References

Deaton, A. S., 2018. The Analysis of Household Surveys: A Microeconometric Approach to Development Policy. (Reissue edition with a new preface.) World Bank, Washington, D.C..

Good, I. J., 1982. C140. Standardized tail-area probabilities. Journal of Statistical Computation and Simulation, 16, 65–66.

Granger, C. W. J. (1998). Extracting information from mega-panels and high frequency data. Statistica Neerlandica, 52, 258-272.

Granger, C. W. J. (2003). Some methodological questions arising from large data sets. In D. E. A. Giles (ed.), Computer-Aided Econometrics, CRC Press, Boca Raton, FL, 1-8.

Lakens, D., 2018. Justify your alpha by decreasing alpha levels as a function of the sample size. The 20% Statistician Blog.

Leamer, E. E., 1978. Specification Searches: Ad Hoc Inference With Nonexperimental Data. Wiley, New York. (Legitimate free download.)

Lin, M., H. C. Lucas Jr., and G. Schmueli, 2013. Too big to fail: Large samples and the p-value problem. Information Systems Research, 24, 906-917.

Schwartz, G., 1978. estimating the dimension of a model. Annals of Statistics, 6, 461-464.

Reporting an R-Squared Measure for Count Data Models

2019-10-27T08:19:00.000-07:00

This post was prompted by an email query that I received some time ago from a reader of this blog. I thought that a more "expansive" response might be of interest to other readers............

In spite of its many limitations, it's standard practice to include the value of the coefficient of determination (R²) - or its "adjusted" counterpart - when reporting the results of a least squares regression. Personally, I think that R² is one of the least important statistics to include in our results, but we all do it. (See this previous post.)

If the regression model in question is linear (in the parameters) and includes an intercept, and if the parameters are estimated by Ordinary Least Squares (OLS), then R² has a number of well-known properties. These include:

0 ≤ R² ≤ 1.
The value of R² cannot decrease if we add regressors to the model.
The value of R² is the same, whether we define this measure as the ratio of the "explained sum of squares" to the "total sum of squares" (R_E²); or as one minus the ratio of the "residual sum of squares" to the "total sum of squares" (R_R²).
There is a correspondence between R² and a significance test on all slope parameters; and there is a correspondence between changes in (the adjusted) R² as regressors are added, and significance tests on the added regressors' coefficients. (See here and here.)
R² has an interpretation in terms of information content of the data.
R² is the square of the (Pearson) correlation (R_C²) between actual and "fitted" values of the model's dependent variable.

However, as soon as we're dealing with a model that excludes an intercept or is non-linear in the parameters, or we use an estimator other than OLS, none of the above properties are guaranteed.

For example, when reporting a linear model that's been estimated by Instrumental Variables, we get different R² values depending on which of the two definitions noted in property 3 above is adopted. Similarly, when estimating Logit and Probit models (for instance), most econometrics packages report several "pseudo-R²" statistics, because there's no single measure that has all of the desirable features that we're used to in the linear model/OLS case.

So-called "count" data arise frequently in empirical economics. These are data that take values that are only non-negative integers, namely 0, 1, 2, 3, 4, ........ Models for such data are often based on the Poisson or negative binomial distributions, although other distributions may also be used. Regressors enter the model by equating the mean of the chosen distribution to a positive function of these variables and their coefficients.

For instance, if the y_i data (i = 1, 2, ...., n) are being modelled using a Poisson distribution with a mean of μ, then we typically assign μ_i = exp[x_i'β], using familiar regression notation. The resulting non-linear model is then estimated by MLE (or quasi-MLE).

What's a sensible way of reporting an R² measure for an estimated Poisson regression?

As with the Logit-Probit case noted above, several possibilities suggest themselves. However, unlike that other case, when modelling "count" data there is actually one definition of R² that really stands out as the obvious choice.

What is it?

Before answering this question, let's look at how R_R², R_E², and R_C² behave when applied in the context of Poisson, or negative binomial, regression. Some key facts include:

The three measures will generally differ in value from one another.
We still have 0 ≤ R_C² ≤ 1. However, although R_R² ≤ 1 it can be negative (even if an intercept is included in the model); and although R_E² ≥ 0 it can be greater than one (even with an intercept).
All three measures can decrease as regressors are added to the model.

When we compare these results with the six properties noted above for the OLS case, they suggest that these R² measures are probably best avoided with count data models. Interestingly, it's R_R² that's reported as a matter of course by the EViews package. Stata, on the other hand, reports McFadden's "pseudo-R²" for these models, but its properties are no better.

Cameron and Windmeijer (1996) effectively answer the question that I posed above.

They consider various R²-type measures for count data models. These measures differ primarily on the type of residuals (from the estimated model) that are used in their construction. As in the case of a linear regression, the usual, or "raw", residuals are the differences between the actual y_i values and their "predicted" mean values. That is, they're of the of the form (y_i - μ_i*), where μ_i* = exp[x_i'β*], and β* is the MLE of the β vector. These residuals give us R_R², noted above.

In regression analysis in general, there are actually lots of different forms of residuals that can be constructed, and these can be useful in various situations - especially with generalized linear models (of which the Poisson count models is an example). Some examples include the Pearson (standardized) residuals and the so-called "deviance" residuals. (for more on the notion of "deviance" and goodness-of-fit, see this post.)

Cameron and Windmeijer (1996) consider the properties of R² measures for Poisson and negative binomial models based on both of these other types of residuals, as well as on the "raw" residuals. (Cameron and Windmeijer (1997) extend these results to a variety of other non-linear models.)

They make a convincing case for constructing an R² measure using the deviance residuals, when working with a Poisson regression model or the negative binomial (NegBin2) model.

(As an aside, when the model is linear and we use OLS, the deviance residuals are just the usual residuals.)

For the Poisson model, the i^th. deviance residual is defined as

d_i = sign(y_i - μ_i*)[2{y_ilog(y_i / μ_i*) - (y_i - μ_i*)}]^½; i = 1, 2, ...., n

and the deviance R² for that model is defined as:

R_D,P² = 1 - Σ{y_ilog(y_i / μ_i*) - (y_i - μ_i*)} / Σ{y_ilog(y_i / ybar)},

where here and below all summations are for i = 1, 2, ...., n.

If the model includes an intercept, then this formula simplifies to:

R_D,P² = 1 - Σ{y_ilog(y_i / μ_i*)} / Σ{y_ilog(y_i / ybar)}.

(Note: if y_i = 0, then y_ilog(y_i) = 0. In this case, d_i = - [2μ_i*]^½.)

Importantly, R_D,P² satisfies the properties 1 to 5 noted earlier.

In the case of the NegBin2 model, the corresponding R² takes the form:

R_D,NB² = 1 - (A / B) ,

where

A = Σ{y_ilog(y_i / μ_i*) - (y_i + α*^-1)log[(y_i + α*^-1) / (μ_i* + α*^-1)]}

and

B = Σ{y_ilog(y_i / ybar) - (y_i + α*^-1)log[(y_i + α*^-1) / (ybar + α*^-1)]}.

("ybar" is the sample average of the y_i values; and α* is the MLE of the dispersion parameter for the NegBin2 distribution.)

The R_D,NB² goodness-of-fit measure satisfies properties 1, 3 and 4 noted earlier.

So, when it comes to reporting an R² for count data models, the usual such measure - based on the "raw" residuals - is generally a very poor choice. Of the other options that are available, the R² measures constructed using the so-called "deviance residuals" stand out as excellent contenders.

References

Cameron, A. C. & F. A. C. Windmeijer, 1996. R-squared measures for count data regression models with applications to health-care utilization. Journal of Business and Economic Statistics, 14, 209-220. (Download working paper version.)

Cameron, A. C. & F. A. C. Windmeijer, 1997. An R-squared measure of goodness of fit for some common nonlinear regression models. Journal of Economerics, 77, 329-342.

October Reading

2019-10-07T13:00:00.004-07:00

Here's my latest, and final, list of suggested reading:

Bellego, C. and L-D. Pape, 2019. Dealing with the log of zero in regression models. CREST Working Paper No. 2019-13.
Castle, J. L., J. A. Doornik, and D. F. Hendry, 2018. Selecting a model for forecasting. Department of Economics, University of Oxford, Discussion Paper 861.
Gorajek, A., 2019. The well-meaning economist. Reserve Bank of Australia, Research Discussion Paper RDP 2019-08.
Güriş, B., 2019. A new nonlinear unit root test with Fourier function. Communications in Statistics - Simulation and Computation, 48, 3056-3062.
Maudlin, T., 2019. The why of the world. Review of The Book of Why: The New Science of Cause and Effect, by J. Pearl and D. Mackenzie. Boston Review.
Qian, W., C. A. Rolling, G. Cheng, and Y. Yang, 2019. On the forecast combination puzzle. Econometrics, 7, 39.

Back to School Reading

2019-09-01T06:40:00.003-07:00

Here we are - it's Labo(u)r Day weekend already in North America, and we all know what that means! It's back to school time.

You'll need a reading list, so here are some suggestions:

Frances, Ph. H. B. F., 2019. Professional forecasters and January. Econometric Institute Research Papers EI2019-25, Erasmus University Rotterdam.
Harvey, A. & R. Ito, 2019. Modeling time series when some observations are zero. Journal of Econometrics, in press.
Leamer, E. E., 1978. Specification Searches: Ad Hoc Inference With Nonexperimental Data. Wiley, New York. (This is a legitimate free download.)
MacKinnon, J. G., 2019. How cluster-robust inference is changing applied econometrics. Working Paper 1413, Economics Department, Queen's University.
Steel, M. F. J., 2019. Model averaging and its use in economics. Mimeo., Department of Statistics, University of Warwick.
Stigler, S. M., 1981. Gauss and the invention of least squares. Annals of Statistics, 9, 465-474.

Book Series on "Statistical Reasoning in Science & Society"

2019-08-20T11:25:00.002-07:00

Back in early 2016, the American Statistical Association (ASA) made an announcement in its newsletter, Amstat News, about the introduction of an important new series of books. In part, that announcement said:

"The American Statistical Association recently partnered with Chapman & Hall/CRC Press to launch a book series called the ASA-CRC Series on Statistical Reasoning in Science and Society.

'The ASA is very enthusiastic about this new series,' said 2015 ASA President David Morganstein, under whose leadership the arrangement was made. 'Our strategic plan includes increasing the visibility of our profession. One way to do that is with books that are readable, exciting, and serve a broad audience having a minimal background in mathematics or statistics.'

The Chapman & Hall/CRC press release states the book series will do the following:

Highlight the important role of statistical and probabilistic reasoning in many areas

Require minimal background in mathematics and statistics

Serve a broad audience, including professionals across many fields, the general public, and students in high schools and colleges

Cover statistics in wide-ranging aspects of professional and everyday life, including the media, science, health, society, politics, law, education, sports, finance, climate, and national security

Feature short, inexpensive books of 100–150 pages that can be written and read in a reasonable amount of time."

Seven titles have now been published in this series -

Measuring Society, by Chaitra H. Nagaraja (2019)
Measuring Crime: Behind the Statistics, by Sharon L. Lohr (2019)
Statistics and Health Care Fraud: How to Save Billions, by Tahir Ekin (2019)
Improving Your NCAA® Bracket with Statistics, by Tom Adams (2018)
Data Visualization: Charts, Maps, and Interactive Graphics, by Robert Grant (2018)
Visualizing Baseball, by Jim Albert (2017)
Errors, Blunders, and Lies: How to Tell the Difference, by David S. Salsburg (2017)

Readers of this blog should be especially interested in Chaitra Nagaraja's recently published addition to this series. Chaitra devotes chapters in her book to the topics of Jobs, Inequality, Housing, Prices, Poverty, and Deprivation. I particularly like the historical perspective that Chaitra provides in this very readable contribution, and I recommend her book to you (and your non-economist friends).

Check out What Happened at the 2019 Joint Statistical Meetings

2019-08-14T14:28:00.000-07:00

Each year, the Joint Statistical Meetings (JSM) bring together thousands (6,500 this year) of statisticians at what's the largest gathering of its type in the world. The JSM represent eleven international statistics organisations, including the four founding organisations - The American Statistical Association (ASA), The International Biometric Society, The Institute of Mathematical Statistical, and The Statistical Society of Canada.

As a member of the ASA since 1973 I've attended a few of these meetings over the years, but unfortunately I didn't make it to the JSM in Denver at the end of last month. As always, the program was amazing.

Yesterday, the ASA released a searchable version of the 2019 program that contains downloadable files of the slides used by many of the speakers. You can find that version of the program here. When you go through the program, look for presentations that have blue (rectangular) "Presentation" button. Papers in sessions sponsored by the Business and Economic Statistics section of the ASA may be of special interest to you - but there's lots to choose from!

Including More History in Your Econometrics Teaching

2019-08-06T06:56:00.003-07:00

If you follow this blog (or if you look at the "History of Econometrics" label in the word cloud in the right side-bar), you'll know that I have more than a passing interest in the history of our discipline. There's so much to be learned from this history. Among other things, we can gain insights into why certain methods became popular, and we can reduce the risk of repeating earlier mistakes!

When I was teaching I liked to inject a few historical facts/anecdotes/curiosities into my classes. I think that this brought the subject matter to life a little. The names behind the various theorems, tests, and estimators are those of real people, after all.

There are some excellent books on the history of econometrics, including those by Epstein (1987), Morgan (1990), and De Marchi and Gilbert (1991). (Also, see the short piece by Stephen Pollock, 2014.)

However, I think that we could do more in terms of making material about this history accessible to our students.

The Statistics community has gone much further in this direction, and we might take note of this.

The other day, Amanda Golbeck posted some very helpful links on the American Statistical Association's "History of Statistics Interest Group" community noticeboard.

Here's her posting in its entirety - and don't miss the first of her links:

"Why not include more history in your teaching? The History of Statistics Interest Group library has a collection of Activities for Classes: community.amstat.org/historyofstats/ourlibrary/...

We are pleased to let you know that Bob Rosenfeld has created 13 history of probability and statistics teaching modules, and he has kindly made them available for you to use in your classes! We hope you will find them to be useful.

Reading and Exercises on the History of Probability from the Vermont Mathematics Initiative, Bob Rosenfeld

Pre-history to 1600 (PDF)
17th Century France (PDF)
Jacob Bernoulli - Law of Large Numbers (PDF)
Inverse Probability - Thomas Bayes (PDF)
Laplace (PDF)

Reading and Exercises on the History of Statistics from the Vermont Mathematics Initiative, Bob Rosenfeld

John Graunt and the Bills of Mortality (PDF)
Origin of the Normal Curve (PDF)
Origins of Graphs in Statistics (PDF)
Fitting models to data - the Path to Least Squares (PDF)
Statistics Moves from Physical to Social Sciences (PDF)
Correlation - Francis Galton (PDF)
t-Distribution and Gosset (PDF)
Fisher and Design of Experiments (PDF)"

(Bob Rosenfeld was former Co-Director for Statistics and School-Based Research at the Vermont Mathenatics initiative, and the author of a number of books on the teaching of statistics to K-8 students. D.G.)

Most of Bob Rosenfeld's pieces are directly relevant to econometrics students. It would be nice to see more material about the history of our discipline that could be incorporated into introductory econometrics courses.

References

De Marchi, N. & C. Gilbert, 1990. History and Methodology of Econometrics. Oxford University Press, Oxford.

Epstein, R. J. 1987. A History of Econometrics. North-Holland, Amsterdam.

Morgan, M. S., 1991. The History of Econometric Ideas. Cambridge University Press, Cambridge.

Pollock, D. S. G., 2014. Econometrics - An historical guide for the uninitiated. Working Paper No. 14/05, Department of economics, University of Leicester.

AAEA Meeting, 2019

2019-07-28T08:40:00.003-07:00

The Agricultural and Applied Economics Association (AAEA) recently held its annual meeting in Atlanta, GA. You can find the extensive program here.

This year, I was fortunate enough to be able to attend and participate.

This was thanks to the kind invitation of Marc Bellemare, a member of the Executive Board of the AAEA, and (of course) a blogger whom many of you no doubt follow. (If you don't, then you should!)

Marc arranged a session in which he and I talked about the pros and cons of The Cookbook Approach to Teaching Econometrics. The session was well attended, and the bulk of the time was devoted to a very helpful discussion-question-answer period with the audience.

As you'll know from some of my previous posts (e.g., here and here), I'm not a big fan of The Cookbook Approach - at least, not if it's the primary/sole way of teaching econometrics. Marc made the point that there's a place for this approach if it's adopted after more formal courses in econometrics. I'm in agreement with that.

I put together a few background talking-point slides for my short presentation. For what they're worth, you'll find then here.

I really enjoyed my time at the AAEA meeting, and I learned a lot. Thanks, Marc, and thank you to the participants!

Seasonal Unit Roots - Background Information

2019-07-06T10:25:00.003-07:00

A recent email query about the language that we use in the context of non-stationary seasonal data, and how we should respond to the presence of "seasonal unit roots", suggested to me that a short background post about some of this might be in order.

To get the most from what follows, I suggest that you take a quick look at this earlier post of mine - especially to make sure that you understand the distinction between "deterministic" seasonality" and "stochastic seasonality" in time-series data.

There's an extensive econometrics literature on stochastic seasonality and testing for seasonal unit roots, and this dates back at least to 1990. This is hardly a new topic, but it's one that's often overlooked in the empirical applications.

Although several tests for seasonal unit roots are available, the most commonly used one is that proposed by Hylleberg et al. (1990) - hereafter "HEGY". Depending on what statistical/econometrics package you prefer to use, you'll have at least some access to the HEGY test(s), and perhaps some others. For instance there are routines that you can use with R, stata, and Gretl.

The EViews package includes a rather complete built-in suite of different seasonal unit root tests for time series data with various periodicities - 2, 4, 5, 6, 7, and 12. This enables us to deal with trading-day weekly data, and calendar weekly data, as well as the usual "seasonal" frequencies.

I'm not going to be going over the tests themselves here.

Rather, the objectives of this post are, first, to provide a bit of background information about the language that's used when we're talking about seasonal unit roots. For instance, why do we refer to roots at the zero, π, frequencies, etc.? Second, in what way(s) do we need to filter a time series in order to remove the unit roots at the various frequencies?

Let's begin by considering a quarterly time series, X_t (t = 1, 2, ........). We'll use the symbol "L" to denote the lag operator. So. L(X_t) = X_t-1; L²(X_t) = L(L(X_t)) = L(X_t-1) = X_t-2; etc. In general, L^k(X_t) = X_t-k.

Unit roots at different frequencies

If we consider the difference between the value of the series, X, now and its value four quarters (one year) ago, we can represent this by (X_t - X_t-4) = (X_t - L⁴X_t) = (1 - L⁴)X_t. Let's take a closer look at the polynomial equation, (1 - L⁴) = 0, in the lag operator, and ask what are its roots?"

We can factorize (1 - L⁴) as follows:

(1 - L⁴) = (1 - L²)(1 + L²)
= (1 - L)(1 + L)(1 + L²) = (1 - L)(1 + L)(1 + iL)(1 - iL) = 0, (1)

and then we see that the roots of (1) are L = 1; L = -1; L = i; and L = -i. Here, "i" is the imaginary number, whose square is -1.

In fact, each of these roots can be written as complex numbers, each of the form (x + iy). For instance, the root L = 1 corresponds to the case x = 1, y = 0. You may also recall that instead of expressing a complex number in terms of Cartesian coordinates, we can write it in terms of polar coordinates. That is, we can write it in the form r(cosθ + i sinθ), where "r" is what we call the "radial coordinate", and θ is the "angular coordinate". Without loss of generality, we'll normalize the former and set r =1 in what follows.

Alright, so where does this leave us? Well, we're going to have to recall some of that trigonometry that you learned in high school!

Let's look at the graphs for the sine and cosine functions, with the argument (x) measured in radians:

These two functions "repeat themselves" every 2π radians (i.e., 360 degrees). This corresponds, of course, to going exactly once around a circle.

So, the root of (1) corresponding to L = 1 can be written as (1 + i 0) = (cosθ + i sinθ), and from the sine and cosine graphs we can see that this implies that θ = 0 (or 0 +/- multiples of 2π, which are still 0). In other words, the series exhibits one cycle per year.

Similarly, the root of (1) corresponding to L = -1 can be written as (-1 + i0) = (cosθ + i sinθ), and from the sine and cosine graphs we can see that this implies that θ = π (or π +/- multiples of 2π). Now the series exhibits two cycles per year.

(We don't need to worry about the additional multiples of 2π, as this would take us "around the circle" more than once. So let's forget about this detail.)

Finally, the roots of (1) corresponding to L = +/-i can be written as (0 +/- i) = (cosθ + i sinθ), and from the sine and cosine graphs we can see that this implies that θ = π/2 or 3π/2. There are now four cycles per year in the data.

To summarize, we can have roots of equation (1) that correspond to one or more of the zero, π, or π/2 or 3π/2 frequencies. Moreover, these last two frequencies really need to be thought of as a pair - after all, they're associated with a complex conjugate pair in the Cartesian coordinate system, whereas the other two roots are "real".

So much for the language associated with seasonal unit roots.

Filtering the data

What filters are need to eliminate the various roots, so as to render the X series stationary?

(i) If L = 1, then this corresponds to the transformation, or filter, (1 - L)X_t = 0. In other words, if there is a unit root at the zero frequency then we need to construct Y_t = (1 - L)X_t = (X_t - X_t-1) to get a stationary series. The usual first-differencing of the data is appropriate.

(ii) If L = -1, then this corresponds to the filter, (1 + L)X_t = 0. That is, if there is a unit root at the π frequency then we need to construct Y_t = (1 + L)X_t = (X_t + X_t-1) to get a stationary series. Notice that this particular filter doesn't involve "differencing" the data.

(iii) If either L = i, or L = -1, then this corresponds to the filter (1 + L²)X_t = 0. So, if there is a unit root at the π/2 or 3π/2 pair of frequencies, then we need to construct Y_t = (1 + L²)X_t = (X_t + X_t-2) to get a stationary series. Again, this filter doesn't involve the usual "differencing".

Multiple roots

Of course, it's quite possible that our time series, X_t, has unit roots at more than one frequency. For example, it may have roots at both the zero and π frequencies. In that case, the filter that will make the series stationary is (1 - L)(1 + L) = (1 - L²). So, we construct Y_t = (X_t - X_t-2). Similarly, if X has unit roots at all of the seasonal frequencies, but not at the zero frequency, then the appropriate filter is (1 + L)(1 + L²) = (1 + L + L² + L³), and the series Y_t = (X_t + X_t-1 + X_t-2 + X_t-3) will be stationary; and so on. If there are unit roots at all four frequencies, then X series is said "seasonally integrated", and the relevant filter is (1 - L)(1 + L)(1 + L²) = (1 - L⁴), and so we "fourth-difference" X_t and form Y_t = (X_t - X_t-4).

Some Extensions

The above discussion is cast in terms of quarterly time-series data. If we have data that are recorded twice-yearly, you should be able to see from the factorization (1 - L²) = (1 - L)(1 + L) that there can be unit roots only at either the zero or π frequencies. (See Feltham and Giles, 2003, for more on this.)

You might guess that case of monthly data gets pretty messy! (See Beaulieu and Miron, 1993.) In this case the unit roots correspond to L = ± 1, ± i, ± (1 ± √(3i))/2, ± (√(3) ± i)/2. The various frequencies are zero, π/6, π/3, π/2, 2π/3, 5π/6, and π.

The final matter that needs mentioning here is the possibility of cointegration when we have two or more seasonal time series. Suppose that X_1t and X_2t are quarterly series, and they each have unit roots at (say) the π frequency. Then it's possible that they may be cointegrated at this frequency. Similarly if X_1t has unit roots at all frequencies, and X_2t has unit roots at the π and zero frequencies, then then the two series may be cointegrated at the zero and/or π frequencies. And so on.

Engle et al. (1993) provide and illustrate a systematic testing framework for seasonal cointegration. It's essentially a generalization of the Engle-Granger two-step cointegration test, with the HEGY tests replacing the ADF test. For another application, see Reinhardt and Giles (2001). Lee (1992) extends the Johansen cointegration tests to the case of (quarterly) seasonal cointegration. A nice application of this procedure is given by Debenedictis (1997).

References

Beaulieu, J. J., & J. A. Miron, 1993. Seasonal unit roots in aggregate U.S. data. Journal of Econometrics, 55, 305-328.

Debenedictis, L. F., 1997. A vector autoregressive model of the British Columbia regional economy. Applied Economics, 29. 877-888.

Engle, R. F., C. W. J. Granger, Hylleberg, S. & H. S. Lee, 1993. The Japanese consumption function. Journal of Econometrics, 55, 275-298.

Feltham, S. G. & D. E. A. Giles, 2003. Testing for unit roots in semi-annual data. in D. E. A. Giles (ed.), Computer-Aided Econometrics. Marcel Dekker, New York, 175-208. (Pre-print here.)

Ghysels, E., H. S. Lee, & J. Noh, 1994. Testing for unit roots in seasonal time series: Some theoretical extensions and a Monte Carlo investigation. Journal of Econometrics, 62, 415-442.

Hylleberg, S., R. F. Engle, C. W. J. Granger, & B. S. Yoo, 1990. Seasonal integration and cointegration. Journal of Econometrics, 44, 215-228.

Lee, H. S., 1992. Maximum likelihood inference on cointegration and seasonal cointegration. Journal of Econometrics, 54, 1-47.

Reinhardt, F. S. & D. E. A. Giles, 2001. Are cigarette bans really good economic policy?. Applied Economics, 33, 1365-1368.

July Reading

2019-07-01T06:38:00.002-07:00

This month my reading list is a bit different from the usual one. I've taken a look back at past issues of Econometrica and Journal of Econometrics, and selected some important and interesting papers that happened to be published in July issues of those journals.

Here's what I came up with for you:

Aigner, D., C. A. K. Lovell, & P. Schmidt, 1977. Formulation and estimation of stochastic frontier production function models. Journal of Econometrics, 6, 21-37.
Chow, G. C., 1960. Tests of equality between sets of coefficients in two linear regressions. Econometrica, 28, 591-605.
Davidson, R. & J. G. MacKinnon, 1984. Convenient specification tests for logit and probit models. Journal of Econometrics, 25, 241-262.
Dickey, D. A. & W. A. Fuller, 1981. Likelihood ratio statistics for autoregressive time series with a unit root. Econometrica, 49, 1057-1072.
Granger, C. W. J. & P. Newbold, 1974. Spurious regressions in econometrics. Journal of Econometrics, 2, 111-120.
Sargan, J. D., 1961. The maximum likelihood estimation of economic relationships with autoregressive residuals. Econometrica, 29, 414-426.

Consulting Can be Fun!

2019-06-21T13:25:00.003-07:00

Over the years, I've done a modest amount of paid econometrics consulting work - in the U.S., New Zealand, Australia, the U.K., and here in Canada. Each job has been interesting, and rewarding, and I've always learned a great deal form the briefs that I've undertaken.

The other day, a friend asked me, "Which consulting job was the most fun?"

Actually, the answer was easy!

A few years ago I consulted for the Office of the Auditor General of Canada, in Ottawa. I was brought in because I had consulted for Revenue New Zealand on the issue of tax evasion, and I had co-authored a book on the Canadian "underground economy" with Lindsay Tedds.

So what was the consulting work with the Auditor General's office all about? Well, they were conducting an audit of what was then called Revenue Canada (now, the Canadian Revenue Agency). In other words, "the tax man"!

Although the report arising from this audit is a matter of public record, I won't go into it here.

Suffice to say, what could be more fun that conducting an audit of your country's tax authority?

2019 Edition of the INOMICS Handbook

2019-06-20T11:13:00.001-07:00

I'm sure that all readers will be familiar with INOMICS, and the multitude of resources that they make available to economists.

The INOMICS Handbook, 2019 is now available, and I commend it to you.

This year's edition of the Handbook includes material relating to:

The gender bias in the field of economics
The soft skills you need to succeed as an economist
Climate change and how economics can help solve it
What makes a successful economist
An exclusive interview with Princeton Professor, Esteban Rossi-Hansberg
Winners of the INOMICS Awards 2019
Recommended study and career opportunities

More Tributes to Clive Granger

2019-06-11T10:02:00.001-07:00

As a follow-up to my recent post, "Clive Granger Special Issue", I received an email from Eyüp Çetin (Editor of the European Journal of Pure and Applied Mathematics).

Eyüp kindly pointed out that "......... actually, we published the first special issue dedicated to his memory exactly on 27 May 2010, the first anniversary of his passing at https://www.ejpam.com/index.php/ejpam/issue/view/11

We think this was the first special issue dedicated to his memory in the world. The Table of Contents may be found here https://www.ejpam.com/index.php/ejpam/issue/view/11/showToc .

Another remarkable point that we also published some personal and institutional tributes and some memorial stories for Sir Granger that never appeared elsewhere before at

https://www.ejpam.com/index.php/ejpam/article/view/805 .

Some institutions such as Royal Statistical Society, Japan Statistical Society and University of Canterbury have sent their tributes to this special volume."

Clive Granger Special Issue

2019-06-07T00:43:00.004-07:00

The recently published Volume 10, No. 1 issue of the European Journal of Pure and Applied Mathematics takes the form of a memorial issue for Clive Granger. You can find the Table of Contents here, and all of the articles can be downloaded freely.

This memorial issue is co-edited by Jennifer Castle and David Hendry. The contributed papers include ones that deal with Forecasting, Cointegration, Nonlinear Time Series, and Model Selection.

This is a fantastic collection of important survey-type papers that simply must read!

Reading Suggestions for June

2019-05-31T10:01:00.000-07:00

Well, here we are - it's June already.

Here are my reading suggestions:

Abadie, A., S. Athey, G. Imbens, & J. Wooldridge, 2017. When should you adjust standard errors for clustering? Mimeo.
Berk, R., A. Buja, L. Brown, E. George, A. K. Kuchibhotla, W. Su, & L, Shazo, 2019. Assumption lean regression. American Statistician, in press.
Ghosh, T., M. Ghosh, & T. Kubokawa, 2019. On the loss robustness of least-square estimators, American Statistician, in press.
Gustafsson, O. & P. Stockhammar, 2019. Variance stabilizing filters. Communications in Statistics - Theory and Methods, in press.
Kocherlakota, N. R., 2019. A near-exact finite sample theory for an instrumental variable estimator. Mimeo. (Hat-tip to Frank Diebold.)
Panagiotelis, A., G. Anathasopoulos, R. J. Hyndman, B. Jiang, & F. Vahid, 2019. Macroeconomic forecasting for Australia using a large number of predictors. International Journal of Forecasting, 35, 613-633.

Update on the "Series of Unsurprising Results in Economics"

2019-05-19T11:27:00.004-07:00

In June of last year I had a post about a new journal, Series of Unsurprising Results in Economics (SURE).

If you didn't get to read that post, I urge you to do so.

More importantly, you should definitely take a look at this piece by Kelsey Piper, from a couple of days ago, and titled, "This economics journal only publishes results that are no big deal - Here’s how that might save science".

Kelsey really understands the rationale for SURE, and the important role that it can play in terms of reducing publication bias, and assisting with replicating results.

You can get a feel for what SURE has to offer by checking out this paper by Nick Huntington-Klein and Andrew Gill that they are publishing.

We'll all be looking forward to more excellent papers like this!

May Reading List

2019-05-01T09:00:00.004-07:00

Here's a selection of suggested reading for this month:

Athey, S. & G. W. Imbens, 2019. Machine learning methods economists should know about. Mimeo.
Bhagwat, P. & E. Marchand, 2019. On a proper Bayes but inadmissible estimator. American Statistician, online.
Canals, C. & A. Canals, 2019. When is n large enough? Looking for the right sample size to estimate proportions. Journal of Statistical Computation and Simulation, 89, 1887-1898.
Cavaliere, G. & A. Rahbek, 2019. A primer on bootstrap testing of hypotheses in time series models: With an application to double autoregressive models. Discussion Paper 19-03, Department of Economics, University of Copenhagen.
Chudik, A. & G. Geogiardis, 2019. Estimation of impulse response functions when shocks are observed at a higher frequency than outcome variables. Globalization Institute Working Paper 356, Federal Reserve Bank of Dallas.
Reschenhofer, E., 2019. Heteroscedasticity-robust estimation of autocorrelation. Communications in Statistics - Simulation and Computation, 48, 1251-1263.

Recursions for the Moments of Some Continuous Distributions

2019-04-29T17:50:00.003-07:00

This post follows on from my recent one, Recursions for the Moments of Some Discrete Distributions. I'm going to assume that you've read the previous post, so this one will be shorter.

What I'll be discussing here are some useful recursion formulae for computing the moments of a number of continuous distributions that are widely used in econometrics. The coverage won't be exhaustive, by any means. I provide some motivation for looking at formulae such as these in the previous post, so I won't repeat it here.

When we deal with the Normal distribution, below, we'll make explicit use of Stein's Lemma. Several of the other results are derived (behind the scenes) by using a very similar approach. So, let's begin by stating this Lemma.

Stein's Lemma (Stein, 1973):

"If X ~ N[θ , σ²], and if g(.) is a differentiable function such that E|g'(X)| is finite, then

E[g(X)(X - θ)] = σ² E[g'(X)]."

It's worth noting that although this lemma relates to a single Normal random variable, in the bivariate Normal case the lemma generalizes to:

"If X and Y follow a bivariate Normal distribution, and if g(.) is a differentiable function such that E|g'(Y)| is finite, then

Cov.[g(Y ), X] = Cov.(X , Y) E[g'(Y)]."

In this latter form, the lemma is useful in asset pricing models.

There are extensions of Stein's Lemma to a broader class univariate and multivariate distributions. For example, see Alghalith (undated), and Landsman et al. (2013), and the references in those papers. Generally, if a distribution belongs to an exponential family, then recursions for its moments can be obtained quite easily.

Now, let's get down to business............

Recall that the r^th. "raw moment" (or "moment about zero") for the random variable X, with distribution function F, is defined as μ_r' = E[X ^r] = ∫ x^rdF(x) , for r = 1, 2, .....; and the "central (centered) moments" of X are defined as μ_r = E[(X - μ₁' )^r] , for r = 1, 2, 3, .......

Also, we can express one of these sets of moments in terms of the other set by using the two relationships:

μ_r = Σ{[r! / (i! (r - i)!)] μ_i' (- μ₁' ) ^r-i} , (1)

where the range of summation is from i = 0 to i = r ; and

μ_r' = Σ{[r! / (i! (r - i)!)] μ_i(μ₁' ) ^r-i} , (2)

where, again, the range of summation is from i = 0 to i = r.

Normal distribution

If X ~ N[θ , σ²], then we know that μ₁' = E[X] = θ, and μ₂' = E[X ²] = σ² + θ².

Also, note that we can write:

μ₃' = E[X ³] = E[X ²(X - θ + θ)] = E[X ²(X - θ)] + θ E[X ²] .

Applying Stein's Lemma, with g(X) = X ², we obtain:

μ₃' = 2σ²E[X] + θ E[X ²] = θμ₂' + 2σ²μ₁' = θ³ + 3θσ².

Similarly, if we apply the Lemma with g(X) = X ^k ; k =3, 4, ......, we obtain:

μ₄' = E[X ⁴] = E[X ³(X - θ)] + θ E[X ³] = θμ₃' + 3σ²μ₂' = θ⁴ + 6θ²σ² + 3σ⁴.

μ₅' = E[X ⁵] = E[X ⁴(X - θ)] + θ E[X ⁴] = θμ₄' + 4σ²μ₃' = θ⁵ + 10θ³σ² + 15θσ⁴;

and so on.

So, each moment is obtained recursively from all of the lower-order moments, through the repeated use of Stein's Lemma.

More particularly, you can see immediately that following recursion formula holds for the raw moments of the Normal distribution (Bain, 1969; Willink, 2005):

μ_r+1' = θ μ_r' + rσ²μ_r-1' ; r = 1, 2, 3, .........

Chi-square distribution

We'll use the following standard result for the Chi-square distribution with "v" degrees of freedom:

"For any real-valued function, h, E[h(χ²_v)] = v E[h(χ²_v+2) / χ²_v+2], provided that the expectations exist."

(This result can be generalized to the case of a non-central Chi-square distribution. See Appendix B of Judge and Bock (1978).

Let h(x) = x^k, for some integer, k. Then, applying the above result repeatedly, we get:

If k = 1, then μ₁' = E[h(χ²_v)] = v E[χ²_v+2 / χ²_v+2] = v .
If k = 2, then μ₂' = E[(χ²_v)2] = v E[(χ²_v+2)² / χ²_v+2] = vE[χ²_v+2] = v(v + 2).

Immediately, we see that μ₂ = Var.[χ²_v] = μ₂' - (μ₁')² = 2v.

If k = 3, then μ₃' = E[(χ²_v)³] = v E[(χ²_v+2)³ / χ²_v+2] = vE[(χ²_v+2)²] = v(v + 2)(v + 4) .

I'll bet that you can see right away what the expressions are for μ₄', μ₅', etc.!

In terms of a genuine recurrence relationship, we see from the above that we can write:

μ_r' = μ_r-1' (v + 2(r -1)) ; r = 1, 2, 3, ........

Although we can then use equation (1) to obtain the central moments of X, there's also a separate recursion formula for these moments in the case of the Chi-square distribution This is discussed in the section on the Gamma distribution, below.

Student-t distribution

Suppose that X follows a Student-t distribution, with v degrees of freedom. Then the moments of X can be summarized as follows:

μ_r' = 0 ; if r is odd, and 0 < r < v

μ_r' = v^(r/2) Π[(2i - 1) / (v - 2i)] ; if r is even, and 0 < r < v (3)

where the product is for i = 1 to i = (r / 2).

The inequality, v > r, ensures that the moment "exists" - that is, it is finite. If v = 1, then X follows a Cauchy distribution, and none of its moments "exist".

Equation (3) gives us a nice, tidy, and direct formula for obtaining any of the even-order moments of X. For instance, we see that:

μ₂' = v [(2- 1) / (v - 2)] = v / (v - 2) ; if v > 2.

Obviously, μ₂ = Var.[X] = μ₂', because E[X] = 0.

Also,

μ₄' = v² [1 / (v - 2)][3 / (v - 4)] = 3v² / [(v - 2)(v - 4)] ; if v > 4.
μ₆' = v³ [1 / (v - 2)][3 / (v - 4)][5 / (v - 6)] = 15v³ / [(v - 2)(v - 4)(v - 6)] ; if v > 6.

etc.

This set of results isn't in the form of a recurrence relationship, but we can obtain one very easily. Note that:

μ₄' = [v / (v - 2)][3v / (v - 4)] = μ₂' [3v / (v - 4)] ; if v > 4.

μ₆' = [v / (v - 2)][3v / (v - 4)][5v / (v - 6)] = μ₄' [5v / (v - 6)] ; if v > 6,

and so on.

Clearly, in general, the following recurrence formula holds -

μ_r' = μ_r-2' [(r - 1) / (v - r)] ; if r is even, and 0 < r < v.

Trust me - this formula is much more appealing than integrating functions of the density, or working with the characteristic function, for this particular distribution.

Finally, note that because E[X] = μ₁' = 0, μ_r = μ_r', for all (even) r.

F distribution

Suppose that X follows an F distribution, with numerator and denominator degrees of freedom, v₁ and v₂ respectively.

The general formula for the raw moments of X is:

μ_r' = (v₂ / v₁) ^r Γ[(v₁ / 2) + r] Γ ([v₂ / 2) - r] / (Γ [v₁ / 2] Γ[v₂ / 2]) . (4)

So,

μ₁' = (v₂ / v₁)Γ[(v₁ / 2) + 1] Γ ([(v₂ / 2) - 1] / (Γ [v₁ / 2] Γ[v₂ / 2] .

Using the result that Γ[x + 1] = xΓ[x], and cancelling terms, this simplifies to

μ₁' = v₂ / (v₂ - 2) ; if v₂ > 2

Similarly,

μ₂' = (v₂ / v₁)² Γ[(v₁ / 2) + 2] Γ ([(v₂ / 2) - 2] / (Γ [v₁ / 2] Γ[v₂ / 2].

Again, noting that Γ[(v₁ / 2) + 2] = ((v₁ / 2) + 1)Γ[(v₁ / 2) + 1]; that Γ[(v₂ / 2) - 2] = Γ[(v₂ / 2) - 1] / ((v₂ / 2) - 2); and then simplifying, we get:

μ₂' = μ₁' (v₂ /v₁)[(v₁ + 2) / (v₂ - 4)] ; if  v₂ > 4

Proceeding in a similar manner, we find that:

μ₃' = μ₂' (v₂/ v₁)[(v₁ + 4) / (v₂ - 6)] ; if  v₂ > 6

μ₄' = μ₁' (v₂ / v₁)[(v₁ + 6) / (v₂ - 8)] ; if  v₂ > 8

etc.

So, in general you can see that we have the following recursion formula for the moments of X:

μ_r+1' = μ_r' (v₂ / v₁)[(v₁ + 2r) / (v2 - 2(r + 1))] ; r = 0, 1, 2, .............; if v2 > 2(r + 1).

This recurrence relationship avoids us having to deal with the gamma functions in (4), let alone having to perform any integration, or deal with this distribution's messy characteristic function. (As with the Student-t distribution, the moment generating function isn't defined here, because of the above conditions on the existence of the moments.)

The distributions that we've considered so far are ones that you use every day in your econometrics work. Now let's consider a couple more distributions that arise a bit less frequently.

Gamma distribution

One situation where the Gamma distribution comes up in econometrics is in the context of "count data". This might seem a bit odd, because count data are non-negative integers, and we're talking about continuous random variables here.

However, the Gamma distribution comes into play when we generalize the Poisson distribution to a particular form of the Negative Binomial distribution. Both of these distributions were discussed in my previous, related, post. Looking back at that post, you'll be able to see that the variance of a Poisson random variable equals its mean. We say that the distribution is "equi-dispersed". This is very restrictive, and isn't realistic with a lot of count data in practice. On the other hand, the variance of the Negative Binomial distribution exceeds its mean. We say that this distribution is "over-dispersed", and in practice this is often more reasonable.

To construct the Negative Binomial distribution in the form that we usually use it in econometrics, we take a Poisson random variable and then add in an unobserved random effect to its (conditional) mean. If this random effect follows a Gamma distribution, we end up with the Negative Binomial distribution for the count data. (See Greene, 2012, pp.806-807 for details.)

With this by way of motivation, consider a random variable, X that follows a Gamma distribution with a shape parameter 'a', and a scale parameter 'b'. (Be careful here - there are two forms of the Gamma distribution. The other one has the shape parameter 'a' and the rate parameter, θ = 1 / b.)

Willink (2003) shows that the central moments (moments about the mean) for X can be obtained from the following recursion -

μ_r = (r - 1)(bμ_r-1 + ab²μ_r-2) ; r = 2, 3, ..........

There are two special cases of the Gamma distribution that we might note. First, if a = (v / 2), and b = 2, then X follows a Chi-square distribution with v degrees of freedom. So, the central moments of the Chi-square distribution follow the recursion relationship:

μ_r = 2(r - 1)(μ_r-1 + vμ_r-2) ; r = 2, 3, ..........

Second, if a = b = 1, then X follows an Exponential distribution, and its central moments satisfy:

μ_r = (r - 1)(μ_r-1 + μ_r-2) ; r = 2, 3, .........

In addition, Withers (1992) shows that for the Exponential distribution we have the following simpler recursion:

μ_r = rμ_r-1 + (-1)^r ; r = 1, 2, 3, .......

Beta distribution

One of the important features of the Beta distribution is that its density's support is the unit interval (although this can be generalized). It's one of the few distributions used by econometricians that has a finite support.

This suggests that it might be useful when modelling data that are continuous in nature, and are in the form of proportions. Indeed, this is the case. I discuss this in the context of consumer demand analysis in this old post. More generally, the Beta regression model is discussed in detail by Ferrari and Cribari-Neto (2004) and Cribari-Neto and Zeileia (2010).

The density function for X, that follows a Beta distribution is:

f (x | α, β) = [Γ(α + β) / (Γ(α)Γ(β))] x ^α-1 (1 - x) ^β-1 ; 0 < x < 1 ; α , β > 0

The general formula for the rth. raw moment of X is

μ_r' = E[X ^r] = [ (α + r - 1)(α + r - 2) ...... α] / [(α + β + r - 1)(α + β + r - 2) .....(α + β)] ; r = 1, 2, 3, ......

Immediately, we have:

μ₁' = α / (α + β)
μ₂' = [α (α + 1)] / [(α + β + 1)(α + β)] = [(α + 1) / (α + β + 1)] μ₁'
μ₃' = [α (α + 1)(α + 2)] / [(α + β + 2)(α + β + 1)(α + β)] = [(α + 2) / (α + β + 2)] μ₂'
etc.

So, a general recursion formula for the raw moments of the Beta distribution is:

μ_r' = [(α + r - 1) / (α + β + r - 1)] μ_r-1' ; r = 1, 2, 3, .....

Numerical examples

Remember, the whole point abut these recursion formulae is they help us to rapidly compute all of the moments of a distribution, up to some pre-specified maximum order, using one general formula.

There is an R script file on the code page for this blog that illustrates this point, first for X ~ N[θ, σ²]; and second for X ~ F[v₁,v₂]. In the first case the first ten raw moments for X when θ = 1 and σ² = 4 are: 1, 5, 13, 73, 281, 1741, 8485, 57233, 328753, 2389141, ....

In the second case, the first ten raw moments for X when v₁ = 7 and v₂ = 24 are: 1.09, 1.68, 3.5265, 9.82, 36.09, 175.28, 1141.85, 10276.63, 135064.30, 2894235.00, ....

References

Alghalith, M., undated. A note on generalizing and extending Stein's lemma.

Bain, L. J., 1969. Moments of non-central t and on-central F distribution. American Statistician, 23, 33-34.

Cobb, L., P. Koppstein, & N. H. Chen, 1983. Estimation and moment recursion relations for multimodal distributions of the exponential family. Journal of the American Statistical Association, 78, 124-130.

Cribari-Neto F. & A. Zeileis, 2010. Beta regression in R. Journal of Statistical Software, 1–24.

Ferrari, S. L. P. & F. Cribari-Neto, 2004. Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31, 799–815.

Greene, W. E., 2012. Econometric Analysis, 7th. ed.. Prentice Hall, Upper Saddle River, NJ.

Judge, G. G. & M. E. Bock, 1978. The Statistical Implications of Pre-Test and Stein-Rule Estimators in Econometrics. North-Holland, New York.

Landsman, Z., S. Vanduffel, & J. Yao, 2013. A note on Stein's lemma for multivariate elliptical distributions. Journal of Statistical Planning and Inference, 11, 2016-2022.

Stein, C. M., 1973. Estimation of the mean of a multivariate normal distribution. Proceedings of the Prague Symposium on Asymptotic Statistics, 345-381.

Willink, R., 2003. Relationships between central moments and cumulants, with formulae for the central moments of gamma distributions. Communications in Statistics - Theory and Methods, 32, 701-704.

Willink, R., 2005. Normal moments and Hermite polynomials. Statistics and Probability Letters, 73, 271-275.

Withers, S. W., 1992. A recurrence relation for the moments of the exponential. New Zealand Statistician, 27, 13-14.

Woodland, A. D., 1979. Stochastic specification and the estimation of share equations. Journal of Econometrics, 10, 361-383.

Recursions for the Moments of Some Discrete Distributions

2019-04-21T19:16:00.002-07:00

You could say, "Moments maketh the distribution". While that's not quite true, it's pretty darn close.

The moments of a probability distribution provide key information about the underlying random variable's behaviour, and we use these moments for a multitude of purposes. Before proceeding, let's be sure that we're on the same page here.

Some background

Suppose that we have a random variable, X, whose distribution function is F(x), where x is some value of X. The following quote comes from an old blog post of mine:

"What is sometimes called "the problem of moments" tells us:

If all of the moments of a distribution exist, then knowledge of these moments is equivalent to knowledge of the distribution itself.

In other words, the moments completely define the distribution.

However, note the word, 'if' in the statement of the result above. And it's a very big 'if'! The problem is that for many distributions the moments exist only under certain conditions; and for some distributions some or all of the moments fail to be defined. In these cases, the "theorem" is of limited help. A sufficient condition for a distribution to completely and uniquely determined by its moments is that its moment generating function (m.g.f.) exists."

The r^th "raw moment' (or '"moment about zero") for the random variable, X is defined as μ_r' = E[X ^r] = ∫ x^rdF(x) ; r = 1, 2, .....

We also use the "central (centered) moments" of X, which are defined as μ_r = E[(X - μ₁' )^r] ; r = 1, 2, 3, ....... Strictly speaking, we're talking about "integer-order" moments in both cases here. (Remember, r = 1, 2, 3, ....) There are also measures called fractional moments and factorial moments, but we won't be concerned with those.

So, obviously, μ₁' = E[X] is just the mean of X; and μ₂ = E[(X - E(X))²] is the variance of X. We can always express the "raw moments" in terms of the "centered moments", and vice versa. For example, you'll be aware that Var.[X] = E[X ²] - [E(X)]². That is, μ₂ = μ₂' - (μ₁')².

In general, the relationships between these two forms of the moments can be obtained by noting that for some values, a and b,

E[(X - b) ^r] = Σ{[r! / (i! (r - i)!)] E[(X - a)ⁱ(a - b) ^r-i} ,

where the range of summation is from i = 0 to i = r.

If we set a = 0, and b = μ₁', we can obtain the centered moments from the raw moments, as follows:

μ_r = Σ{[r! / (i! (r - i)!)] μ_i' (- μ₁' ) ^r-i} , (1)

where the range of summation is from i = 0 to i = r.

Similarly, if we set a = μ₁', and b = 0, we can obtain the raw moments from the centered moments:

μ_r' = Σ{[r! / (i! (r - i)!)] μ_i(μ₁' ) ^r-i} , (2)

where, again, the range of summation is from i = 0 to i = r.

Often, the first couple of moments of a distribution (i.e., of a random variable) may be all that interest us. For instance, the Mean Squared Error (MSE) of an estimator, θ*, of a parameter, θ, is just it's variance plus the square of its bias. In other words, MSE (θ*) = μ₂ + (μ₁' - θ)², where the moments here are those of the sampling distribution of the estimator, θ*.

However, we use the third and fourth (central) moments when defining skewness and kurtosis. For example, Pearson's definitions of these two measures are Skew (X) = μ₃ / (μ₂)^3/2, and Kurtosis (X) = μ₄ / (μ₂)². Notice that both of these quantities are unitless. A good example of where we use them is in the construction of tests for Normality. For instance, the null hypothesis for the Jarque-Bera test is that the skewness = 0 and the kurtosis = 3. Only a Normal distribution has this property. So the J-B test, which is a Lagrange multipler ("score") test, is testing the validity of two restrictions on the parameters of the distribution. That's why the test statistic always has a null distribution that's Chi-square, with two degrees of freedom.

Higher-order moments can be used to introduce additional shape parameters into various distributions. Sometimes the existence of up to the first eight moments is needed in the proofs of various results in statistics and econometrics.

Now look back at the basic definition of the raw moments, namely, μ_r' = E[X ^r] = ∫ x^rdF(x) ; r = 1, 2, ..... ; and μ_r = E[(X - μ₁') ^r] = ∫ (x - μ₁') ^rdF(x) ; r = 1, 2, .....

I'm not sure about you, but I don't think I'd want to be evaluating all of those integrals if there was an easier way to obtain the moments! And you probably know that there is. We can use the moment generating function (if it exists), or the cumulant generating function (which always exists). In each case, we have two options. One involving successively differentiating the m.g.f or the c.g.f.; and the other involves using a Taylor series expansion and equating coefficients.

Even then, for some distributions, obtaining the moments in these ways can be rather tedious - especially if you have reason to obtain the moments up to a fairly high order,

Is there an even simpler way to proceed? I don't want to sound lazy, but computational efficiency is important.

Recursions, recursions, .......

Econometrics students should be pretty familiar the discussion so far. Now, let's turn to some other relationships between the moments of certain distributions. Usually, these relationships don't come up in your typical econometrics course, but they're very useful - especially from a computational viewpoint.

Specifically, these relationships are in the form of recursion formulae.

I'm sure you know what a recursion is. It's a formula that takes us forward by building on the previous steps. As a really trivial example of this, consider the formula for the factorial of an integer, n. You know that n! = n (n-1)(n-2)(n-3).....3.2.1. So, we can write n! = n (n-1)! In other words, once we have computed (n-1)! we just multiply by n to move one set forward to n!, and so on.

Because 1! = 1, we have 2! = 2 (1!) = 2; 3! = 3(2!) = 6; 4! = 4(3!) = 24; etc.

If we proceed sequentially in this way, we just have one multiplication to do at each new step, instead of a whole lot of the. I know that isn't a big deal, and nor is there really that much computational cost-saving in this example. However, in more complex problems, recursion formulae can be very powerful tools.

Why is this interesting in the context of the moments of a random variable?

Such recursion formulae aren't available for all of the distributions that you're likely to encounter. (It turns out that most of them require membership of the so-called exponential families.) But there are some interesting ones for discrete distributions that are relevant in econometrics. My follow-up post will deal with similar results for some continuous distributions, where we'll see that there's also an important connection with Stein's Lemma.

Binomial distribution

A couple of useful references here are Zhang et al. (2018) and Riordan (1937).

Strictly, this first example, and those that follow, aren't in the form of direct recursions - they're more like indirect recursions, but they each involve a neat "trick".

Suppose that X ~ Bi[(n - 1) , p] and Y ~ Bi[n , p]. Consider a function g(.) such that both |E[g(.)]| and |g(-1)| are finite. Then,

E[g(X)] = E[Y g(Y - 1) / (np)] .

We can use this result to obtain the raw moments of Y, recursively. If we let g(Z) = Z ^k, for some Z, then:

E[X ^k] = E[Y(Y - 1)^k / (np)] .

If k = 0, then 1 = E[Y] / (np); and so μ₁' = E[Y] = np. (Clearly, E[X] = (n - 1)p. )
If k = 1, then E[X] = E[Y(Y - 1) / (np)]; and so μ₂' = E[Y ²] = μ₁' + n(n - 1)p² = np[1 + (n - 1)p].

(So, the variance of Y is μ₂ = μ₂' - (μ₁')² = np(1-p).

If k = 2, then E[X ²] = E[Y(Y - 1)² / (np)]; and so μ₃' = E[Y ³] = 2μ₂' + μ₁' {(n - 1)p[1 + (n - 2)p - 1}] = n(n - 1)(n - 2)p³ + 3n(n - 1)p² + np .
If k = 3, then we get μ₄' = E[Y ⁴] = (3 + np)E[Y ³] − 3E[Y ²] + E[Y] = np + 7n(n − 1)p² + 6n(n − 1)(n − 2)p³ + n(n − 1)(n − 2)(n − 3)p⁴

and so on................... Notice that the rth. moment is expressed in terms of the preceding moments. And of course, if it's the centered moments that we want, then the general formula in (1) , above, allows us to calculate them easily from these raw moments.

As an aside, a different type of recursion formula for these moments is derived by Bényi and Manago (2005).

Poisson distribution

A useful reference here is Hwang (1982).

Let X follow a Poisson distribution with parameter λ. Consider a function g(X) such that both |E[g(X)]| and |g(-1)| are finite. Then,

E[λ g(X)] = E[X g(X - 1)] .

If we let g(X) = X ^k we have:

E[X ^k] = (1 / λ) E[X (X - 1)^k] (3)

and we can use this formula to generate the central moments of the Poisson distribution recursively.

If k = 0, then 1 = (1 / λ) E[X], or μ₁' = E[X] = λ.
If k = 1, then E[X] = (1 / λ) {E[X ²] - E[X]}, and so μ₂' = E[X ²] = (λ + 1) μ₁' = λ(λ + 1).

(Immediately, the variance of X is μ₂ = μ₂' - (μ₁')² = λ.)

If k = 2, then E[X ²] = (1 / λ) E[X(X - 1)²], and so μ₃' = E[X ³] = (λ + 2) μ₂' - μ₁' = λ³ + 3λ² + λ.

If k= 3, then E[X ³] = (1 / λ) E[X(X - 1)³], and so μ₄' = E[X ⁴] = (λ + 3)μ₃' - 3μ₂' + μ₁' ; etc.

So, you see that once we have obtained μ₁' to μ_r', we can easily obtain μ_r+1' for any r.

The moments of X about its mean, λ, can then be obtained directly from these raw moments, using (1).

Negative Binomial distribution

Again, see Hwang (1982).

First, note that there are several forms of the Negative Binomial distribution. Here, I'm defining it in terms of a random variable, X, that counts the number of failures before the rth. success, where p is the probability of a success. This includes one version of the Geometric Distribution as a special case (when r = 1).

Let X follow a Negative Binomial (NegBin) distribution with parameters r and p. Consider a function g(X) such that both |E[g(X)]| and |g(-1)| are finite. Then,

E[(1 - p) g(X)] = E[X g(X - 1) / (r + X - 1)] .

If we let g(X) = (X + r)^k, then

E[(X + r)^k] = (1 - p)^-1 E[X(X + r - 1)^k-1].

Following the same procedure as for the Poisson distribution, we can again obtain successively higher moments recursively from the lower moments:

If k = 1, then (1 - p)E(X + r) = E(X); and so μ₁' = r (1 - p)/ p.
If k = 2, then (1 - p)E[(X + r)²] = E(X ²) + rE(X) - E(X); and so μ₂' = [r(1 - p) / p²][1 + r(1 - p)] = [μ₁']² + μ₁' / p.

(Right away, we see that Var.[X] = r(1 - p) / p² = μ₁' / p.)

If k = 3, then (1 - p)E[(X + r)]³ = E[X ³] + 2(r - 1)E[X ²] + (r - 1)²E[X]; and so μ₃' = {[r(1 - 3p) + 2] / p}μ₂' + {[r²(2 - 3p) + 2r - 1] / p}μ₁' + r³(1 - p) / p ; etc.

Again, once we have obtained μ₁' to μ_r', we can easily obtain μ_r+1' for any r; and the central moments of X can be obtained from (1).

Obtaining "true" recursions

I noted earlier that one advantage of a recursion formula is that it can simplify and speed computations. From the discussion of the three distributions above, it may not be obvious that we can achieve this from what we've seen so far.

In fact, we can easily "extract" direct recursion formulae for the raw moments in each case. I'll use the Poisson distribution to illustrate this, and you should be able to apply the same approach to the other two distributions.

Let's go back to equation (3), where we saw that

E[X ^k] = (1 / λ) E[X (X - 1)^k] (3)

where k = 0, 1, 2, .....

From the binomial theorem, X(X - 1)^k = Σ[k! / (i! (k - i!))] X ^k-i+1 (-1)ⁱ, where the range of summation is from i = 0 to i = k. (Watch out - this range will change twice in what follows!)

So, from (3),

E[X ^k] = (1 / λ) Σ[k! / (i! (k - i!))](-1)ⁱ E[X ^k-i+1]

= (1 / λ) Σ[k! / (i! (k - i!))](-1)ⁱ E[X ^k-i+1] + (1 / λ)E[X ^k+1], (4)

where now the range of summation is from i = 1 to i = k.

Extracting the term involving i = 1 in the summation in (4), and re-arranging, we get:

E[X ^k+1] = (λ + k)E[X ^k] - Σ[k! / (i! (k - i!))](-1)ⁱ E[X ^k-i+1] , (5)

where the range of summation in (5) is for i = 2 to i = k.

Setting k = 0, 1, 2, 3 we get the expressions for μ₁', μ₂', μ₃', and μ₄' for the Poisson distribution given earlier.

Of course, what we have in (5) is a nice tidy recursion formula that enables us to calculate the (k + 1)th raw moment as a function of all of the preceding moments.

A numerical example

We can easily write some code that will allow us to compute as many moments of the Poisson distribution as we wish, very efficiently. Subject to numerical "overflow" limitations, of course!

By way of a quick illustration, there's some R code (available on the code page of this blog) that speedily computes the first ten raw moments for the Poisson distribution with λ = 1. Here are the results:

The R code simulates the moments and plots the p.m.f. of the empirical distribution.

The recursion formula comes into its own if,for example, you want to compute the 20th. moment when λ = 3. The answer is 1.4 x10¹⁸. Try working that one out by hand!

Obviously, you can amend the R code to choose your own value of λ, and the number of moments that you want to calculate.

The BIG takeaway from this post? If you want to to compute (lots of ) the moments of some common discrete distributions, you should look beyond the standard approaches. There are some interesting recursion formulae that might help you.

I have a follow-up post on its way that deals with similar results for some familiar continuous distributions. Watch out for it!

References

Bényi, Á. & S. M. Manago, 2005. A recursive formula for moments of a binomial distribution. College Mathematics Journal, 36, 68-72.

Hwang, J. T., 1982. Improving on standard estimators in discrete exponential families with applications to Poisson and negative binomial cases. Annals of Statistics, 10, 857–867.

Riordan, J., 1937. Moment recurrence relations for binomial, Poisson and hypergeometric frequency distributions. Annals of Mathematical Statistics, 8, 103-111.

Zhang, Y-Y., T-Z. Rong, & M-M. Li, 2018. Expectation identity for the binomial distribution and its application in the calculations of high-order binomial moments. Communications in Statistics - Theory and Methods, available online.

2019 Econometric Game Results

2019-04-12T17:16:00.000-07:00

The Econometric Game is over for another year.

The winning team for 2019 was from the University of Melbourne.

The second and third placed teams were from the Maastricht University and Aarhus University, respectively.

Congratulations to the winning teams, and to all who competed this year!

© 2019, David E. Giles

EViews 11 Now Available

2019-04-10T06:57:00.001-07:00

As you'll know already, I'm a big fan of the EViews econometrics package. I always found it to be a terrific, user-friendly, resource when teaching economic statistics and econometrics, and I use it extensively in my own research.

Along with a lot of other EViews users, I recently had the opportunity to "test drive" the beta release of the latest version of this package, EViews 11.

EViews 11 has now been officially released, and it has some great new features. (Click on the links there to see some really helpful videos.) To see what's now available, check it out here.

Nice update. Thanks!

SHAZAM!

2019-04-09T06:44:00.003-07:00

This past weekend the new movie, Shazam, topped the box-office revenue list with over US$53million - and this is it's first weekend since being released.

Not bad!

Of course, in the Econometrics World, we associate the word, SHAZAM, with Ken White's famous computing package, which has been with us since 1977.

Ken and I go way back. A few years ago I had a post about the background to the SHAZAM package. In that post I explained what the acronym "SHAZAM" stands for. If you check it out you'll see why it's timely for you to know these important historical facts!

And while you're there, take a look at the links to other tales that illustrate Ken's well-known wry sense of humour.

A Permutation Test Regression Example

2019-04-08T06:54:00.002-07:00

In a post last week I talked a bit about Permutation (Randomization) tests, and how they differ from the (classical parametric) testing procedure that we generally use in econometrics. I'm going to assume that you've read that post.

(There may be a snap quiz at some point!)

I promised that I'd provide a regression-based example. After all, the two examples that I went through in that previous post were designed to expose the fundamentals of permutation/randomization testing. They really didn't have much "econometric content".

In what follows I'll use the terms "permutation test" and "randomization test" interchangeably.

What we'll do here is to take a look at a simple regression model and see how we could use a randomization test to see if there is a linear relationship between a regressor variable, x, and the dependent variable, y. Notice that I said a "simple regression" model. That means that there's just the one regressor (apart from an intercept). Multiple regression models raise all sorts of issues for permutation tests, and we'll get to that in due course.

There are several things that we're going to see here:

How to construct a randomization test of the hypothesis that the regression slope coefficient is zero.
A demonstration that the permutation test is "exact". That it, its significance level is exactly what we assign it to be.
A comparison between a permutation test and the usual t-test for this problem.
A demonstration that the permutation test remains "exact", even when the regression model is mi-specified by fitting it through the origin.
A comparison of the powers of the randomization test and the t-test under this model mis-specification.

The Monte Carlo experiment

A lot of this information will be revealed by means of a Monte Carlo experiment. The associated R code can be downloaded from the code page for this blog. (Even if you're not an R user, you can read the code file with any text editor, and there are lots of comments in it to help you.)

Now, let's get down to business. First, let's look at the regression model that we'll be working with. It's of the form:

y_i = β₁ + β₂ x_i + ε_i ; i = 1, 2, 3, ...., n. (1)

The regressor, x, is non-random, but apart from that we're not really going to make any assumptions beyond "exchangeability" of the sample observations. (See my previous post for an explanation of this.)

If (1) is the Data-Generating Process (DGP), as well as the form of the model that is estimated by OLS, then there is no model mis-specification. On the other hand if (1) is the DGP, but we "under-fit" the regression by omitting the intercept, then we have a mis-specified model. In the latter case, we know that this has adverse implications for the OLS estimator and for the conventional tests that we conduct.

You'll know by now that to conduct a permutation test we have to choose a suitable test statistic, S. Given the null hypothesis that we want to test, one option would be to choose S = b₂, where b₂ is the OLS estimator for β₂. That would be just fine, even though it's not a pivotal statistic. Then we could keep the ordering of the y values in the sample, and consider permutations of the x data (or vice versa - it wouldn't matter). See Edgington (1987) or Noreen (1989).

Alternatively, we could recall that b₂ is proportional to the (Pearson) simple correlation coefficient between y and x. In fact the proportionality "constant" is the ratio of the sample standard deviations of y and x. This ratio stays the same as we permute the sample values. Moreover, for a set of x and y values, the correlation coefficient increases monotonically with b₂.

So, all we have to do is to test the null hypothesis of "no correlation", and this will serve our purpose precisely. The alternative hypothesis will be 2-sided, namely that there is a linear correlation between x and y. This is a problem that was dealt with in Example 2 of my last post on permutation tests, so we know already what's involved.

This time our R code will be a bit more efficient than before. We'll take advantage of Steve Garren's muOutlier package for R to implement the permutation test.

Sample sizes of n = 15, 30, and 60 are are considered separately. (This is hardly a situation where we can appeal to "large-sample asymptotics".) The number of permutations are 1.308x10¹², 2.653x10³², and 8.321x10⁸¹ for n = 15, 30, and 60 respectively. Obviously, we'll just use a random selection of these possible permutations! For n = 15, 30 we'll use 2,000 selections; and we'll use 5,000 selections for n = 60. A bit of experimentation shows that these numbers are sufficient for the accuracy that we want, as is the number of Monte Carlo replications that we'll use, namely 2,000.

Within each replication of the Monte Carlo experiment what we do is construct and apply the randomization test. We compute its p-value and then keep track of those p-values that are less than or equal to our desired significance level, which will be 5%. If the null hypothesis is true, the observed fraction of such p-values over our 2,000 Monte Carlo replications will be 5%. (Any departure from this value would be due to using too few replications, or two few randomly chosen permutations.)

At each replication of the experiment we also apply the usual t-test for a zero slope. Note that the latter is based on lots of assumptions - such as errors that independent, homoskedastic, and Normally distributed - whereas the permutation test is free of these.

Significance levels

To get a "fair" comparison between the two type of tests at the outset, the DGP in (1) uses random disturbances that are i.i.d. Normal, with a zero mean and a constant variance. So, When the null hypothesis is true and the model that is estimation is correctly specified, the t-test should reject the null hypothesis in 5% of the replications of the experiment if we use the appropriate Student-t critical value(s).

So, with this in mind, let's take a look at a partial set of our results, for the case where n = 30. In Table 1, the regression is fitted through the origin, so unless the value of β₁in the DGP is zero, the estimated model would be mis-specified:

Table 1

We'll come back to this table shortly, but for now just focus on the row that is highlighted in light green. This corresponds to the case where both β₁ and β₂ are zero. That is, the DGP in (1) has no intercept, so the model is correctly specified, and the null hypothesis of a zero slope is true.

Because the null hypothesis is true, the power of the test is just its significance level (5%). Notice that the reported empirical significance levels match the anticipated 5%!

This is good news. Given the particular errors that were used in the DGP in the simulations, this had to happen for the t-test. However, the result for the randomization test could be misleading. Why? Well. we could obtain the 5% rejection rate correctly, even if the distribution of the p-values from which this rate was calculated is "weird".

In an old post on this blog I discussed the sampling distribution of a p-value. One point that I covered was that if the null hypothesis is true, then this sampling distribution has to be Uniform on [0 , 1], regardless of the testing problem! This result gives another way of checking if the code for our simulation experiment is performing accurately, and that we've used enough replications and random selections of the permutations.

Figure 2 shows the distribution of the 2,000 p-values for the permutation test, for the situation in the first row of results in Table 1:

Figure 1

As we can see, this distribution is "reasonably uniform", as required. More formally, using the uniftest package in R we find that the Kolmogorov-Smirnov test statistic for uniformity is D = 0.998 (p = 0.24); and the Kuiper test statistic is V = 1.961 (p = 0.23). So, we seem to be in good shape here.

So far, all that we seem to have shown is that the permutation test and the t-test exhibit no "size-distortion" when the model is correctly specified, and the errors satisfy the assumptions needed for the t-test to be valid. Well, whoopee!

Let's take a look back at Table 1, and now focus on the second line of results (highlighted in orange). Recall that the estimated model omits the intercept - the model is fitted through the origin. However, now, in Table 1, β₁= 1, so the DGP includes an intercept and the fitted model is under-specified. We've omitted a (constant) regressor from the estimated model.

In this case the usual t-statistic follows a non-central Student-t distribution, with a non-centrality parameter that increases monotonically with β₁², and which depends on the x data and the variance of the error term. It's value is unobservable! And we can't use the critical value(s) from the non-central t distribution if we don't know the value of the non-centrality parameter.

Obviously, the usual (central) Student-t critical values are no longer correct, and the observed significance level (the rejection rate of the null in the experiment when the null is true) will differ from 5%. Depending on the situation, it may be less than 5%, or greater than 5%. The extent of this difference is the "size-distortion" associated with the test when we mis-apply it in this way.

In the second line of Table 1 we see two important things. First, the t-test has a downwards size-distortion. We wanted to apply the test at the 5% significance level, but in fact it only rejected the null, when it was true, 1% of the time! Of course, if we just mis-applied this test once, in an application, we would have no idea if there was any substantial size-distortion or not.

The second (really neat) thing that we see in the second line of that table is that the permutation test still has a significance level of 5%! Even though the model is mis-specified, this doesn't affect the test - at least in terms of it still being "exact" with respect to the significance level that we wanted to achieve.

And this is a totally general result!

Power considerations

So much for the significance level, and size-distortion. What about the powers of the tests that we're considering?

If we move down Table 1 we still have a mis-specified model (because β₁≠ 0), but the value of the slope coefficient is increasing. That is, the null hypothesis is becoming more and more false. We're now looking at the power functions for the tests. Because the size of the t-test is distorted downwards, it's not surprising that this test has lower power than the permutation test in this particular case. (More generally, that need not be the case - the power curves could intersect at some point.)

Figures 2 and 3 show the corresponding power curves when n = 15, and n = 60. Note that the ranges of the horizontal axes are different in each Figures 1 to 3. This reflects the fact that both tests are "consistent", and so their powers approach the value "1" (when the null becomes increasingly false) more rapidly as n grows.

Also, note that while the results in Figures 2 are similar to those seen in Table 1, those in Figure 3 are quite different. In the latter case the size-distortion for the t-test is upwards (at 7.8%) and its power curve lies above that of the permutation test. It's not that the t-test is (necessarily) more powerful than the the permutation test. We can't compare (true) power unless the two tests have the same significance level - and they don't in this case.

Figure 2

Figure 3

The detailed results from the Monte Carlo experiment on which Table 1 and Figures 2 and 3 are based can be found in an Excel spreadsheet on the data page for this blog.

Be careful!

Three caveats are in order here:

1. We set β₁= 1 in the DGP and this resulted in a mis-specified "fitted" regression model. This chosen value for β₁affects the (numerical) results. The magnitudes of the size distortions and the powers are specific to this choice.

2. We need to be careful, here, when we talk about comparisons between the powers of the two tests (and this comment is universally applicable). Strictly, power comparisons are valid when the tests have the same (actual, empirical) significance level. The only exception is when Test A has lower actual significance level than Test B, but Test A has a higher rejection rate than Test B when the null is false (i.e., higher "raw", or apparent, power") over the full parameter space. Then, Test A is more powerful than Test B.

In all other cases where there is size distortion, the "true" power will not be clear. One way to deal with this is to "size-adjust" any test that exhibits size distortion. This would be done, in our Monte Carlo experiment, by "jiggling" the critical values used for the t-test to ones that ensure that the test has a 5% rejection rate when the null is true. Then we could proceed to generate the corresponding power curves and make valid comparisons.

That's fine, but of course in practice we wouldn't know what modified critical values to use (unless we conducted a Monte Carlo experiment every time we undertook a real-life application). Perhaps this is worthy of another blog post at some stage, perhaps as an addition to my earlier ones on the basics of Monte Carlo simulation - here, here, and here.

3. As I stressed from the outset, model (1) is only a simple regression model. Multiple regression models are much more interesting, but this is where things get a lot trickier when it comes to constructing an exact permutation test. Quite a lot has been written about this problem. Some of the issues are discussed quite nicely by Kennedy (1995), but the (statistics) literature has moved along a bit since then and some of his suggestions are no longer supported. Some key references include Anderson and ter Braak (2003), Huh and Jhun (2001), Kim et al. (2000), LePage and Podgórski (1996), Oja (1987), and Schmoyer (1994).

Anderson and Robinson (2001) provide simulation evidence that favours the particular permutation procedure suggested by Freedman and Lane (1983). More recently, Nyblom (2015) introduces a permutation procedure based on the regression residuals, and illustrates its application to problems involving tests for autocorrelation, heteroskedasticity, and structural breaks.

Summing up

Here are the takeaways from this post:

Permutation tests are nonparametric, or "distribution free". Unlike the usual tests that we use in econometric, we don't need to satisfy a whole lot of (possibly questionable) parametric and distributional assumptions.
Permutation tests are easy to apply, even though they can be (moderately) computationally intensive.
Permutation tests are "exact", in the sense that achieve precisely the significance level that we want them to. There is no "size distortion".
Usually, we have lots of legitimate, and simple, choices for the test statistic in any given problem. In each case, the test will be exact, though its power may vary depending on our choice.
A permutation test will still be exact, even if the model is mis-specified. This stands in stark contrast to the usual parametric tests that we use.
A permutation test may be more, or less, powerful than its parametric counterparts, depending on the situation.
If you plan to use permutation (randomization) tests in the context of the multiple regression model (or its extensions), then their are some pitfalls that you need to be aware of. Be sure to look at the references that I've supplied.

Other than that - Happy Testing!

References

Anderson, M. J. & J. Robinson, 2001. Permutation tests for linear models. Australian and New Zealand Journal of Statistics, 43, 75-88.

Anderson, M. J., & C. J. F. ter Braak, 2003. Permutation tests for multi-factorial analysis of variance. Journal of Statistical Computation and Simulation, 73, 85–113.

Edgington, E. S., 1987. Randomization Tests. Marcel Dekker, New York.

Freedman, D. & D. Lane, 1983. A nonstochastic interpretation of reported significance levels. Journal of Business and Economic Statistics, 1, 292-298.

Huh, M-H. & M. Jhun, 2001. Random permutation testing in multiple linear regression. Communications in Statistics - Theory Methods, 30, 2023–2032.

Kennedy, P. E., 1995. Randomization tests in econometrics. Journal of Business and Economic Statistics, 13, 85-94.

Kim, H.-J., M. P. Fay, E. J. Feuer, & D. Midthune, 2000. Permutation tests for jointpoint regression with applications to cancer rates. Statistics in Medicine, 19, 335–351.

LePage, R. & K. Podgórski, 1996. Resampling permutations in regression without second moments. Journal of Multivariate Analysis, 57, 119–141.

Noreen, E. W., 1989. Computer Intensive Methods for Testing Hypotheses: An Introduction. Wiley, New York.

Nyblom, J., 2015. Permutation tests in linear regression. Chapter 5 in K. Nordhausen & S. Taskinen (eds.), Modern Nonparametric, Robust and Multivariate Methods. Springer International, Switzerland.

Oja, H., 1987. On permutation tests in multiple regression and analysis of covariance problems. Australian Journal of Statistics, 29, 91–100.

Schmoyer, R. L., 1994. Permutation tests for correlation in regression errors. Journal of the American Statistical Association, 89, 1507–1516.

Econometrics Beat: Dave Giles' Blog

It's Time to Go

Everything's Significant When You Have Lots of Data

Reporting an R-Squared Measure for Count Data Models

October Reading

Back to School Reading

Book Series on "Statistical Reasoning in Science & Society"

Check out What Happened at the 2019 Joint Statistical Meetings

Including More History in Your Econometrics Teaching

Suggested Reading for August

AAEA Meeting, 2019

Seasonal Unit Roots - Background Information

July Reading

Consulting Can be Fun!

2019 Edition of the INOMICS Handbook

More Tributes to Clive Granger

Clive Granger Special Issue

Reading Suggestions for June

Update on the "Series of Unsurprising Results in Economics"

May Reading List

Recursions for the Moments of Some Continuous Distributions

Recursions for the Moments of Some Discrete Distributions

2019 Econometric Game Results

EViews 11 Now Available

SHAZAM!

A Permutation Test Regression Example