Monday, April 21, 2014

More On the Limitations of the Jarque-Bera Test

Testing the validity of the assumption, that the errors in a regression model are normally distributed, is a standard pastime in econometrics. We use this assumption when we construct standard confidence intervals  for, or test hypotheses about, the parameters of our models. In a post some time ago I pointed out that this assumption is actually is sufficient, but not necessary, for the validity of these inferences.

More recently, here and here, I discussed some aspects of the normality test that most econometricians use - the asymptotically valid test of Jarque and Bera (1987). Let's refer to this as the JB test. In the first of those posts I made brief mention of the finite-sample properties of the JB test, and I concluded:
"However, more recent evidence suggests that the power of the J-B test can be quite low in small samples, for a number of important alternative hypotheses - e.g., see Thadewald and Buning (2004). I'll address this aspect of the J-B test more fully in a later post."
The main results obtained by Thadewald and Buning are summed up in the abstract to their paper .............

"It turns out that for the Jarque-Bera test the approximation of critical values by the chi-square distribution does not work very well. The test is superior in power to its competitors for symmetric distributions with medium up to long tails and for slightly skewed distributions with long tails. The power of the Jarque-Bera test is poor for distributions with short tails, especially if the shape is bimodal, sometimes the test is even biased."
In their Monte Carlo analysis of the power of the JB test, the alternative hypotheses are generated by using "contaminated normal" distributions for the regression errors. Their reference to a "biased test" is to one whose power function dips below the significance level, at least over part of the parameter space. You'll recognise that this is a very undesirable property for any test to have. In means that, at least part of the time, the probability that the test will correctly reject false hypotheses is less than the probability that it will wrongly reject a true hypothesis!

Indeed, the poor performance of "omnibus" tests for normality (of the JB type) in small or medium-sized samples has been known at least since D'Agostino (1986). This issue was addressed in a Monte Carlo study by Urzua (1996). He suggested a modified test statistic, in which the asymptotic means and variances of the third and fourth moments are replaced with their finite-sample counterparts. Not surprisingly, this "adjusted" test (AJB) was found to out-perform the usual JB test. The AJB test is included in the study by Thadewald and Buning.

Mantalos (2010) also finds that the actual "size" of the JB test tends to be greater than the nominally assigned significance level if the latter is less than (about) 3%, while the converse is true if the nominal significance level exceeds 3%. This erratic behaviour is attributed to the lack of independence, in finite samples, between the skewness and kurtosis measures that are used in the construction of the JB test statistic. To be fair, other standard tests for normality suffer from the same problem. Mantalos (2011) attempts to deal with this issue by bootstrapping the critical values for the JB test for small-sample applications.

As a follow-up to my earlier post, Mark Salmon emailed me:
"I saw the normality JB test blog and felt the need to push another literature under your eyes. The original form of the JB test is not immune to estimation error/parameter uncertainty in particular forms- this has generated a small literature exploring its limitations and more robust tests which needs more exposure I feel given the dominance that the JB test has in the discipline and software packages. I have attached a couple of papers that I have in my folder but there are more recent papers on the topic I know."
The papers that Mark referred to are  those of Fiorentini et al. (2004), and Bontemps and Meddahi (2005). The first of these shows that the JB test is still applicable when used in the context of (most) GARCH-M models. The second paper proposes an alternative testing procedure, based on GMM estimation. One advantage of this approach is that it is readily made robust to heteroskedasticity and autocorrelation.

Mark was too modest to mention his own work on this general topic (Kiefer and Salmon, 1983).

Another paper that should be mentioned here is that of Frain (2006), where the JB test is found to be lacking in power when the sample size is 50 or less.

Finally, if you're a time-series econometrician, you might ask: "How does the JB test perform in the context of a spurious regression?" That's one of the questions that I addressed in Giles (2007) - see here for an earlier post on this. The short answer is that the JB test statistic diverges in distribution as the sample size grows. This means that even if the errors are in fact normally distributed, the null hypothesis of normality will be rejected, increasingly often, because of the non-stationarity of the data. That's not good news, but the problem lies not with the test, but with the researcher who fails to detect the unit roots!

So, what are the main take-away messages from this post?
  1. Be very careful when using the JB test in small-sample situations.
  2. In such cases, the significance level of the test can be distorted, either upwards or downwards, relative to what you think it is.
  3. The JB test can have very low power in finite samples, and the test can even be "biased".
  4. Consider using alternative tests if there is any suspicion that the regression errors may be autorcorrelated or heteroskedastic.


Bontemps, C. and N. Meddahi, 2005. Testing normality: A GMM approach. Journal of Econometrics, 124, 149-186.

D'Agostino, R. B., 1986. Tests for the normal distribution. In R. B. D'Agostino and M. A. Stephens, eds., Goodness of Fit Techniques. Marcel Dekker, New York, 367-419.

Fiorentini, G., E. Sentana, and  G. Calzolari, 2004. On the validity of the Jarque-Bera Normality test in conditionally heteroskedastic dynamic regression  models. Economics Letters, 83, 307-312.

Frain, J. C., 2006. Small sample power of tests of tests of normality when the alternative is an alpha-stable distribution. Mimeo.

Giles, D.E.A., 2007. Spurious regressions with time-series data: Further asymptotic results. Communications in Statistics, Theory and Methods, 36, 967-979.

Jarque, C. M. and A. K. Bera, 1987. A test for normality of observations and regression residuals. International Statistical Review, 55, 163-172.

Kiefer, N. M. and M. Salmon, 1983. Testing normality in econometric models. Economics Letters, 11, 123-127.

Mantalos, P., 2010. Robust critical values for the Jarque-Bera test for normality. JIBS Working Papers 2101-08, Jönköping International Business School, Jönköping University.

Mantalos, P., 2011. The three different measures of the sample skewness and kurtosis and the effects to the Jarque-Bera test for normality. International Journal of Computational Economics and Econometrics, 2, 47-62.

Thadewald, T, and H. Buning, 2004. Jarque-Bera test and its competitors for testing normality - A power comparison. Discussion Paper Economics 2004/9, School of Business and Economics, Free University of Berlin.

Urzua, C. M.,1996. On the correct use of omnibus tests for normality. Economics Letters, 53, 247–251.

Wurtz, D. and H. G. Katzgraber, 2005. Precise finite-sample quantiles of the Jarque-Bera adjusted Lagrange multiplier test. Mimeo., Institute for Theoretical Physics, Swiss Federal Institute of Technology.

© 2014, David E. Giles

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.