In a recent post I talked a bit about some of the (large sample) asymptotic properties of Maximum Likelihood Estimators (MLEs). With some care in its construction, the MLE will be consistent, asymptotically efficient, and asymptotically normal.These are all desirable statistical properties.
Most of you will be well aware that MLEs also enjoy an important, and very convenient, algebraic property - we usually call it "invariance". However, you may not know that this property holds in more general circumstances than those that are usually mentioned in econometrics textbooks. I'll come to that shortly.
In case the concept of invariance is news to you, here's what this property is about. Let's suppose that the underlying joint data density, for our vector of data, y, is parameterized in terms of a vector of parameters, θ. The likelihood function (LF) is just the joint data density, p(y | θ) , viewed as if it is a function of the parameters, not the data. That is, the LF is L(θ | y) = p(y | θ). We then find the value of θ that (globally) maximizes L, given the sample of data, y.
In case the concept of invariance is news to you, here's what this property is about. Let's suppose that the underlying joint data density, for our vector of data, y, is parameterized in terms of a vector of parameters, θ. The likelihood function (LF) is just the joint data density, p(y | θ) , viewed as if it is a function of the parameters, not the data. That is, the LF is L(θ | y) = p(y | θ). We then find the value of θ that (globally) maximizes L, given the sample of data, y.
That's all fine, but what if our main interest is not in θ itself, but instead we're interested in some function of θ, say φ = f(θ)? For instance, suppose we are estimating a k-regressor linear regression model of the form:
y = Xβ + ε ; ε ~N[0 , σ2In] .
Here, θ' = (β' , σ2). You'll know that in this case the MLE of β is just the OLS estimator of that vector, and the MLE of σ2 is the sum of the squared residuals, divided by n (not by n-k). The first of these estimators is minimum variance unbiased; while the second estimator is biased. Both estimators are consistent and "best asymptotically normal".
Now, what if we are interested in estimating the non-linear function, φ = f(θ) = (β1 + β2β3)?
Now, what if we are interested in estimating the non-linear function, φ = f(θ) = (β1 + β2β3)?