## Sunday, March 9, 2014

### Testing for Multivariate Normality

In a recent post I commented on the connection between the multivariate normal distribution and marginal distributions that are normal. Specifically, the latter do not necessarily imply the former.

Suppose that we have several variables which we think may have a joint distribution that's normal. We could test each of the variables for normality, separately, perhaps using the Jarque-Bera LM test. If the null hypothesis of normality was rejected for one or more of the variables, this could be taken as evidence against multivariate normality. However, if normality couldn't be rejected for any of the variables, this wouldn't tell us anything about their joint distribution.

What we need is a test for multivariate normality itself. Let's see what's available.

Recall that in the case of a single random variable, the Jarque-Bera (1987) test (following earlier work by Bowman and Shenton, and others) is a test of the hypothesis that the skewness of the underlying distribution is zero, and the kurtosis of that distribution is three. So, one way to approach testing for multivariate normality might be to test if some multivariate measure of skewness is zero, and at the same time some suitable measure of multivariate kurtosis is three.

The question is, how might we define these two measure in the multivariate case? It's not as simple as you might think, and there's certainly more than one way to proceed. For example, Mardia (1970) and Srivastava (1984) provide different definitions fo skewness and kurtosis in the multivariate case.

You can find a really clear discussion of these measures in Koisumi et al. (2009).These authors then present multivariate counterparts to the Jarque-Bera test, based on these alternative definitions of skewness and kurtosis.

Their simulation experiments indicate that these asymptotic tests need to be used with caution if the sample size is relatively small - but that's true of the usual (univariate) Jarque-Bera test as well, as I'll be discussing further in a later post.

The work by Koisumi et al. is most instructive, and it suggests an area for further important research, if you happen to be looking for something to fill in your spare time.

References

Jarque, C. M. and A. K. Bera, 1987. A test for normality of observations and regression residals. International Statistical Review, 55, 165-172.

Koisumi, K., N. Okamoto, and T. Seo,, 2009. On Jarque-Bera tests for assessing multivariate normality. Journal of Statistics: Advances in Theory and Applications, 1, 207-220.

Mardia, K. V., 1970. Measure of multivariate skewness and kurtosis with applications. Biometrika, 57, 519-530.

Srivastava, M. S., 1984. A measure of skewness and kurtosis and a graphical method for assessing multivariate normality. Statistics and Probability Letters, 2, 263-267.

1. Normality tests are used routinely for model validation in the tradition I have been trained in. This can be seen in OxMetrics which follows that tradition and therefore reports normality tests, including a multivariate version, for the residuals when fitting a VAR. "Residual analysis" is also central in the cointegration package CATS in RATS. The multivariate normality test used in OxMetrics and CATS is Doornik and Hansen (2008) from the Oxford Bulletin and that article seems to build upon the same work as Koisumi et. al.. By the way, Hansen is professor at my department and one of the original authors of CATS.

2. There is also the paper by Doornik and Hansen, An Omnibus Test for Univariate and Multivariate Normality, available at: http://www.doornik.com/research/normal2.pdf
Eric de Souza

1. Thanks - there are no doubt others too.

3. Multivariate normality? Forget it. The answer is always "No".

The real question is "If I model this distribution as multivariate normal, will I be misled in the inferences I want to make about it?" The answer to that question is, as always, "It depends on what questions you want to ask".

Putting that to one side, the property that characterises the multivariate normal more simply than any other is that the marginal distribution of *every* linear function has to be univariate normal. This seems to open the way to some reasonable checks, at least, using univariate tests of normality and simulation; alternatively looking at univariate test statistics and finding, by optimisation, the worst case value for a multivariate distribution, and using that as your test statistic for the multivariate case. Finding even the null distribution might be a bit of chore, though.

4. A L Nagar passed away last month. If and when you find time, could you kindly write about his contributions to econometrics.

1. Sriram - thanks for the comment. Yes, I have a piece dedicated to Nagar coming up shortly.