## Thursday, March 15, 2012

### Goodness-of-Fit Testing With Discrete, Circular, Data

Testing if a sample of data comes form a specific distribution is a central problem in statistics. This sort of "goodness-of-fit" testing is also important in econometrics, of course. Most goodness-of-fit tests involve "comparing" the empirical distribution function for the sample data with an hypothesized theoretical distribution. The tests rely on the Glivenko-Cantelli Theorem, which states that the maximum (vertical)  "gap" between the empirical and theoretical c.d.f.'s will vanish, everywhere on the support of the distribution, as the sample size grows without limit.

Some such tests are based on this "maximum gap", while others are based on the area between the empirical and theoretical c.d.f.'s. Examples of the first type of test include those associated with the names of  Kolmogorov, Smirnov, Kuiper, Watson and Lilliefors. Examples of the second type include the tests of  Anderson and Darling, and Cramér and von Mises.

All of these tests are available in EViews. You select the series and then choose "View", "Descriptive Statistics & Tests", and then "Empirical Distribution Tests".

These tests are "distribution-free, at least asymptotically, if the data are continuous. However, if the data are discrete, this property of the tests is usually lost. If the data are continuous and "circular" (or directional), Watson's test and Kuiper's test are usually used. However, if the data are discrete and circular, some interesting issues arise.

I have a paper (downloadable from here) that discusses this issue; shows how exact quantiles for a modified Watson test can be obtained for various distributions; evaluates the power of this test; and provides some illustrative applications.

Yesterday, I presented a seminar based on this paper. The presentation slides (in pdf format) are available here.

I hope that you find this material interesting.

1. I have been reading your blog and I wonder if you also post articles that deals with basic econometrics? Probably estimating supply/demand curves using real data (with example file like eviews) and total factor productivity, among others.

Thanks!

1. Thanks for the suggestion - I'll see what I can do!