Econometrics Beat: Dave Giles' Blog: Count Data & the Hermite Distribution

Friday, April 13, 2012

Count Data & the Hermite Distribution

One of the limitations of the usual discrete distributions that we use when modeling "count data" is that they can't allow for multi-modality (except in a trivial manner). So, there's no use in trying to model multi-modal data using a Poisson regression model, or a Negative Binomial regression model, for example.

However, such data occur frequently in practice. So, what options are open to us?

Here's an example of the type of data that I'm referring to:

A while back, I suggested the use of the so-called Hermite distribution (Kemp and Kemp, 1965). This distribution is very flexible in terms of its ability to allow for over-dispersion in the data, as well as being able to model multi-modal samples. Surprisingly, I could find no examples, in any area of application, where covariates had been introduced into the model - in the way that we do with our standard count data regressions. Neither could I find any applications of the distribution itself to economic data.

The Hermite distribution is a generalized Poisson distribution, taking its name from the fact that its probabilities and factorial moments can be expressed in terms of the coefficients of (modified) Hermite polynomials. The bivariate Poisson and the Poisson-binomial distributions are special cases of the Hermite distribution. An Hermite variate also arises as the sum of an ordinary Poisson variate and an independent Poisson ‘doublet’ variate; and the distribution of the sum of a finite number of correlated Poisson variates is also Hermite (McKendrick, 1926 and Maritz 1952).

In practice, Maximum likelihood estimation of the Hermite regression model is straightforward, as I describe in Giles (2007, 2010). In Eviews, this can be implemented by using a LOGL object. In Giles (2010) I illustrated this with an application to the number of currency and monetary crises in IMF member countries. The empirical distribution of the data is:

As you can see, the sample data are over-dispersed and multi-modal. The covariates that I used in my model were dummy variables to allow for the country's income level; and to allow for experiencing one or more crises under a (de jure) intermediate exchange rate regime, or under a (de jure) floating exchange rate regime. The results based on the Hermite distribution clearly dominated those based on Poisson and Negative Binomial regression models.

When you're faced with multi-modal count data, you might think about modelling them using an Hermite regression. The Working Paper version of Giles (2010) is available here.

References

Giles, D.E.A., 2007. Modeling inflated count data. In L. Oxley and D. Kulasiri (eds.), MODSIM 2007 International Congress on Modelling and Simulation, Modelling and Simulation Society of Australia and New Zealand, Christchurch, N.Z., 919-925.

Giles, D.E.A., 2010. Hermite regression analysis of multi-modal count data. Economics Bulletin, 30, 2936-2945.

Kemp, C. D. and A. W. Kemp, 1965. Some properties of the ‘Hermite’ distribution. Biometrika, 52, 381-394.

Maritz, J. S., 1952. Note on a certain family of discrete distributions. Biometrika, 39, 196-198.

McKendrick,A. G., 1926. Applications of mathematics to medicine. Proceedings of the Edinburgh Mathematical Society, 44, 98-130.