## Thursday, July 16, 2015

### Questions About the Size and Power of a Test

Osman, a reader of this blog, sent a comment in relation to my recent post on the effects of temporal aggregation on t-tests, and the like. Rather than just bury it, with a short response, in the "Comments" section of that post, I thought I'd give it proper attention here.

"Thank you for this illustrative example. My question is not exactly related to the subject of your post. As you illustrated, the finite sample properties of tests are studied by investigating the size and power properties. You reported size distortions to assess the size properties of the test. My first question is about the level of the size distortions. How much distortions is need to conclude that a test is useless? Is there an interval that we can construct around a nominal size value to gauge the significance of distortions? Same type of questions can also be relevant for the power properties. The “size adjusted power” is simply rejection rates obtained when the DGP satisfies an alternative hypothesis. Although, the power property is used to compare alternative tests, we can still ask question regarding to the level of the power. As your power curve shows, the level of power also depends on the parameter value assumed under the alternative hypothesis. For example, when β= 0.8 the power is around 80% which means that the false null is rejected 80 times out of 100 times. Again, the question is that what should be the level of the power to conclude that the test has good finite sample properties?"
Let's look at Osman's questions.

## Thursday, July 9, 2015

### 'Student', on Kurtosis

W. S. Gosset (Student) provided this useful aid to help us remember the difference between platykurtic and leptokurtic distributions:

('Student', 1927. Errors of routine analysis. Biometrika, 19, 151-164. See p. 160.)

Here, β2 is the fourth standardized moment of the distribution about its mean. The Normal distribution has β2 = 3.

The appropriate definition of "kurtosis" for uni-modal distributions has been the subject of considerable discussion in the statistical literature. Should it be based on the characteristics of the tail of the distribution; the shape of the density around its mode; or both?

## Wednesday, July 8, 2015

### Parallel Computing for Data Science

Hot off the press, Norman Matloff's book, Parallel Computing for Data Science: With Examples in R, C++ and CUDA  (Chapman and Hall/ CRC Press, 2015) should appeal to a lot of the readers of this blog.

The book's coverage is clear from the following chapter titles:

1. Introduction to Parallel Processing in R
2. Performance Issues: General
3. Principles of Parallel Loop Scheduling