In a recent post I raised the point about the spurious degree of precision that is often encountered with reported regression results. So, here's a challenge for you - how many decimal places (or maybe significant digits) are appropriate when reporting OLS regression results?
Of course, the answer will depend on the data that are used, and the precision to which they are available in the first place, so let's be specific. Consider the following sample of n = 10 observations:
y
|
x
|
5.02
|
0.033
|
6.1
|
0.21
|
7.34
|
0.1234
|
8.2
|
0.193
|
9.0
|
0.2003
|
11.123
|
0.41
|
13.2
|
0.661
|
14.99
|
0.85
|
16.01
|
1.01
|
18.7
|
1.67
|
These data are in a text file on the Data page that goes with this blog.
Suppose that I estimate the following simple regression model by OLS:
yi = α + β xi + εi ; i = 1, 2, 3, ...., 10.
Here is the output that's obtained when the model is estimated using EViews:
Here are the results when you use the (free!) gretl package:
(The EViews and gretl files are available this blog's Code page.)
I have a slight preference for the gretl output - can you see why?
Given the precision of the original data, what level of numerical precision (number of decimal places) do you think is really appropriate here when reporting:
- The estimated regression coefficients?
- The standard errors?
- The coefficient of determination (R2)?
I'll look forward to your comments!
© 2011, David E. Giles
You prefer the gretl output because it gives the p-value in scientific notation?
ReplyDeleteFor the coefficients, I would use precision based on the standard errors. But I wouldn't really know how to think about the precision on the standard error. For the R^2, I think the standard error of the regression vis a vis the standard error of the dependent variable might provide some insight.
I can't think of a good reason to report a limited number of decimal places (apart from space limitations).
ReplyDeleteI do agree with Owen. But for some reasons I don't go below 2 decimal places.
ReplyDeleteI think that the "general" precision of the data in terms of number of significant digits must guide the number of sig digits to use in the reported coefficient estimates. In this case, it seems to me that generally the number of sig digits in the data is three, so you should report the coefficients as 6.42 and 8.48 and the std errors as 0.69 and 0.95.
ReplyDelete@all: I'm going to let this one run for a few dyas before telling you what I have in mind!
ReplyDelete1. I'm a bit confused how the precision of the *data* would guide the reporting precision of the coefficients (or at least how there can be a simple relation between the two). If the data were very imprecise, they would start to look discrete, so OLS may not be appropriate; but with an appropriate model we could still get precise estimates of beta. I would think the SEs are a better guide to precision.
ReplyDeleteIn reverse order:
2. R squared should be reported to 2 decimal places (or 1 perc point). 1 dp would not be enough, e.g. we could not distinguish between a model explaining 66% and 74% of variance.
3. As a general rule the SEs should be reported to 2 significant digits for similar reasons.
4. From the SEs we can calculate the 95% confidence interval around beta. In the above example it is (6.30, 10.67) using t(8). This is wide (imprecise), therefore we may choose to report our central estimate of beta without any dps, simply as 8. Or we may want to let the reader calculate their own CIs, and report beta to one or two dps (or could just report the CI to, say, one DP).
@all: O.K., I've posted a (partial) answer at:
ReplyDeletehttp://davegiles.blogspot.com/2011/12/reported-accuracy-for-regression.html