Suppose that the random variables X1 and X2 are jointly distributed as bivariate Normal, with means of θ1 and θ2, variances of σ12 and σ22 respectively, and a correlation coefficient of ρ.
In this post we're going to be looking at the distribution of the ratio, W = (X1 / X2).
You probably know that if X1 and X2 are independent standard normal variables, then W follows a Cauchy distribution. This will emerge as a special case in what follows.
The more general case that we're concerned with is of interest to econometricians for several reasons.
For example:
y1t = γ y2t + u1t
y2t = β xt + u2t
The restricted reduced form (RRF) of the model is:
y1t = π1 xt + v1t
y2t = π2 xt + v2t ,
where π1 = (β γ); and π2 = β.
So, if we're agreed that instances of such (random) ratios arise in econometrics, let's get back to thinking about their distribution.
Why is this important?
Well, to give just one example, if we have constructed one of these ratio (point) estimators, and we want to construct a confidence interval, then we'll need to know the underlying sampling distribution for the estimator. In general, the latter will not be normal in finite samples, even though the regression errors are assumed to be normal in each of the examples above.
Notice that W = (X1 / X2) can take values anywhere on the real line. The density function for W was first derived by Geary (1930), but only for the special case where θ1 = θ2 = 0. Fieller (1932) derived the density for the case where X1 and X2 have non-zero means. Later, Hinkley (1962) derived the corresponding distribution function and considered the merits of an approximation to the latter.
You probably know that if X1 and X2 are independent standard normal variables, then W follows a Cauchy distribution. This will emerge as a special case in what follows.
The more general case that we're concerned with is of interest to econometricians for several reasons.
For example:
- In the case of a regression model of the form, yi = α + β xi + εi, the elasticity between yi and xi is ηi = (β xi / yi). If the errors are normally distributed and this elasticity is estimated by replacing β with its OLS estimator, b, the elasticity estimator will be a ratio of (correlated) normal random variables.
- If we have a regression model of the form, log(yi) = α + β xi + εi, the marginal effect between yi and xi is mi = (β / yi). If the errors are normally distributed and this marginal effect is estimated by replacing β with its OLS estimator, b, the marginal effect estimator will also be a ratio of (correlated) normal random variables.
- Consider the simple consumption function, Ct = α + β Yt + γ Ct-1 + εt, where Y denotes personal disposable income. The long-run marginal propensity to consume is mpc = β / (1 - γ). If the errors are normally distributed and mpc is estimated by replacing β and γ by their OLS estimates, this estimator of mpc is a ratio of (correlated) normal random variables.
- Suppose that we have a two-equation simultaneous equations model, with the follwing structural form (SF):
y2t = β xt + u2t
The restricted reduced form (RRF) of the model is:
y1t = π1 xt + v1t
y2t = π2 xt + v2t ,
where π1 = (β γ); and π2 = β.
Suppose that the model's errors are normally distributed, with a non-zero (contemporaneous) covariance, and we estimate the RRF coefficients by OLS, yielding estimators π1* and π2*. Then the implied estimator of γ is γ* = (π1* / π2*). This is also the ratio of two (correlated) normal random variables. (e.g., see Zellner, 1978, p.133).
So, if we're agreed that instances of such (random) ratios arise in econometrics, let's get back to thinking about their distribution.
Why is this important?
Well, to give just one example, if we have constructed one of these ratio (point) estimators, and we want to construct a confidence interval, then we'll need to know the underlying sampling distribution for the estimator. In general, the latter will not be normal in finite samples, even though the regression errors are assumed to be normal in each of the examples above.
Notice that W = (X1 / X2) can take values anywhere on the real line. The density function for W was first derived by Geary (1930), but only for the special case where θ1 = θ2 = 0. Fieller (1932) derived the density for the case where X1 and X2 have non-zero means. Later, Hinkley (1962) derived the corresponding distribution function and considered the merits of an approximation to the latter.
Let's look at Fieller's result, taken here from from Hinkley (1969, p. 636):
So, the expression for the density of W, f(w), is a bit of a mess!
However, as Hinkley notes, as (θ2 / σ2) → ∞ ( i.e., as Pr. [X2 > 0] → 1), the distribution function F(w) → Φ[(θ2 w - θ1) / (σ1 σ2 a(w)]. So, under certain circumstances a normal approximation to the distribution of W may be appropriate. Of course, θ2 and σ2 are unobservable, so this may not be of too much help.
Let's evaluate f(w) for a few situations, and see what it looks like.
Specifically, we'll look at the simple regression model, yi = α + β zi + εi, where the errors are independently distributed as N[0 , σ2]. We'll let X1 and X2 be the OLS estimators of α and β respectively. So W is an estimator of the negative of the intercept of the regression line with the z axis.
Recall that X1 and X2 are unbiased estimators of α (θ1) and β (θ2). In addition, if n is the sample size, and Zbar is the sample mean of the regressors,
σ12 = σ2[(1/n) + Zbar2 / (Σ(zi - Zbar)2)
σ22 = σ2 / (Σ(zi - Zbar)2)
ρ = σ12 / (σ1 σ2)
where
σ12 = -Zbar / (Σ(zi - Zbar)2) .
In the following evaluations, we'll set α = 0, σ2 = 1, β = 0.1, and n = 25. The regressor variable is a linear time trend, zi = 1, 2, ...., n.
So, Zbar = (n + 1) / 2; and Σ(zi - Zbar)2 = [Σ(zi2) - n Zbar2 ] = [n (n + 1) (n - 1)] / 12 .
Example: n = 25, β = 0.1 ; so (β / σ2) = 3.606 and ρ = -0.874
However, as Hinkley notes, as (θ2 / σ2) → ∞ ( i.e., as Pr. [X2 > 0] → 1), the distribution function F(w) → Φ[(θ2 w - θ1) / (σ1 σ2 a(w)]. So, under certain circumstances a normal approximation to the distribution of W may be appropriate. Of course, θ2 and σ2 are unobservable, so this may not be of too much help.
Let's evaluate f(w) for a few situations, and see what it looks like.
Specifically, we'll look at the simple regression model, yi = α + β zi + εi, where the errors are independently distributed as N[0 , σ2]. We'll let X1 and X2 be the OLS estimators of α and β respectively. So W is an estimator of the negative of the intercept of the regression line with the z axis.
Recall that X1 and X2 are unbiased estimators of α (θ1) and β (θ2). In addition, if n is the sample size, and Zbar is the sample mean of the regressors,
σ12 = σ2[(1/n) + Zbar2 / (Σ(zi - Zbar)2)
σ22 = σ2 / (Σ(zi - Zbar)2)
ρ = σ12 / (σ1 σ2)
where
σ12 = -Zbar / (Σ(zi - Zbar)2) .
In the following evaluations, we'll set α = 0, σ2 = 1, β = 0.1, and n = 25. The regressor variable is a linear time trend, zi = 1, 2, ...., n.
So, Zbar = (n + 1) / 2; and Σ(zi - Zbar)2 = [Σ(zi2) - n Zbar2 ] = [n (n + 1) (n - 1)] / 12 .
Example: n = 25, β = 0.1 ; so (β / σ2) = 3.606 and ρ = -0.874
One final point. Earlier in this post I commented that if X1 and X2 are standard normal, and independent, then the distribution of W will be Cauchy (i.e., Student-t with one degree of freedom). Setting θ1 = θ2 = ρ = 0; and σ1 = σ2 = 1 in the expressions above, we immediately get the following results:
a(w) = (w2 + 1)½ ; b(w) = c = 0; d(w) = 1;
and
f(w) = [π(w2 + 1)]-½ ; - ∞ < w < ∞ .
This is just the density for a Cauchy random variable, and such a random variable has no finite moments.
The take-away message here is straightforward:
If you're constructing statistics that are ratios of estimated regression coefficients (and hence are ratios of correlated normal random variables), don't expect their sampling distribution to be normal, or even to have finite moments.
References
Fieller, E. C., 1932. The distribution of the index in a normal bivariate population. Biometrika, 24, 428-440.
Geary, R. C., 1930. The frequency distribution of the quotient of two normal variates. Journal of the. Royal Statistical Society, 93, 442-446.
Hinkley, D. V., 1969. On the ratio of two correlated normal random variables. Biometrika, 56, 635-639.
Zellner, A., 1978. Estimation of functions of population means and regression coefficients including structural coefficients: A minimum expected loss (MELO approach. Journal of Econometrics, 8, 127-158.
Zellner, A., 1978. Estimation of functions of population means and regression coefficients including structural coefficients: A minimum expected loss (MELO approach. Journal of Econometrics, 8, 127-158.
David,
ReplyDeleteAs always I enjoy your posts.
This particular topic is one that has long been of interest to me. Ron Bewley and I did some work on your second example of estimating long-run responses in dynamic models (Bewley and Fiebig, 1990). More recently I have revisited the problem in yet another context. This time with discrete choice models where coefficient estimates by themselves are typically not of interest and analysts often estimate ratios of parameters to produce Marginal Rates of Substitution and in particular marginal willingness to pay (WTP) where the parameter in the denominator relates to price or cost.
This literature has moved to greater use of random coefficient models such as mixed logit (McFadden and Train, 2000) and extensions (Fiebig et al., 2010). In the case of normally distributed random coefficients and where WTP is the focus you now have a case where the true, as distinct from estimated, object of interest is a ratio of normals. Because of the non-existence of moments problem, advice is often given that the price coefficient should be specified as a fixed rather than random parameter in order to avoid the problem. Having gone ahead and estimated WTP from such a specification, analysts then typically estimate WTP ignoring the reoccurrence of the problem they initially sought to resolve!
As is often the case, convenience is not necessarily a good objective in specifying an econometric model and so I’ve never been convinced by the advice to leave price with a fixed parameter. Why shouldn’t there be heterogeneity in how people respond to price? In Bartels et al. (2006) we do in fact allow for random coefficients on price and then we essentially use the Hinkley (1969) work to argue that what we have is reasonable for our example. Simulation evidence provided by Meijer and Rousendal (2006) supports such an approach and specifically recommend against treating the cost coefficient as fixed rather than random.
Daly et al. (2011) stress that the non-existence of moments problem is not confined to cases where normality of the cost parameter is assumed. They also point out the risks of simulating from these distributions and inferring incorrectly that the moments exist.
References
Bartels, R., Fiebig, D.G. and van Soest, A. (2006), “Consumers and experts: An econometric analysis of the demand for water heaters”, Empirical Economics, 31, 369–391.
Bewley, R. and Fiebig, D.G. (1990), “Why are long-run parameter estimates so disparate?” The Review of Economics and Statistics, 72(2), 345-349.
Daly, A., Hess, S. and Train, K., (2011), “Assuring Finite Moments for Willingness to Pay in Random Coefficient Models”, Transportation, 39(1), 19-31.
Fiebig, D.G., Keane, M., Louviere, J.J., Wasi, N., (2010), “The generalized multinomial logit: accounting for scale and coefficient heterogeneity”, Marketing Science, 29(3), 393–421.
McFadden, D. and Train, K., (2000), “Mixed MNL models for discrete response”, Journal of Applied Econometrics 15(5), 447–470.
Meijer, E. and Rouwendal, J., (2006), “Measuring welfare effects in models with random coefficients”, Journal of Applied Econometrics 21(1), 227-244.
Denzil - thanks for the thoughtful comments - and the references! D.
Delete