"Information theory provides a constructive criterion for setting up probability distributions on the basis of partial knowledge, and leads to a type of statistical inference which is called the maximum entropy estimate. It is least biased estimate possible on the given information; i.e., it is maximally noncommittal with regard to missing information."
In other words, when we want to describe noisy data with a statistical model, we should always choose the one that has Maximum Entropy.
A few days ago I posted about the new book, An Information Theoretic Approach to Econometrics ,co-authored by Gorge Judge and Ron Mittelhammer. In one of the subsequent comments I was asked what the "information theoretic approach" to econometrics is about.
Here's a brief explanation of the bare bones of Information and Entropy Econometrics (IEE) - hopefully it will get you interested in this approach to econometrics.
Amos Golan, in his Guest Editorial for a special issue of the Journal of Econometrics devoted to IEE (and in honour of Arnold Zellner) in 2007, had this to say:
[Golan, 2007, pp. 379-380. I have emphasized certain passages in red: DG]"Econometrics is the science (and art) of processing information from limited and noisy data. Within econometrics, IEE is the sub-discipline of processing information from limited and noisy data with minimal a priori information on the data-generating process. In particular, IEE is a research that directly or indirectly builds on the foundations of Information Theory and the principle of Maximum Entropy (ME). IEE includes research dealing with statistical inference of economic problems given incomplete knowledge or data, as well as research dealing with the analysis, diagnostics and statistical properties of information measures. The main thread connecting all the estimation methods within IEE is that rather than starting with a pre-specified likelihood, they use the observed data in order to estimate a set of natural weights, or empirical distribution, which is most consistent with the observed sample moments or data. Regardless of the sample size, there are infinitely many sets of ‘‘weights’’ that are consistent with the observed sample, making the problem an under-determined problem. For such a problem one needs to choose one of these infinitely many solutions. This is done by minimizing a certain criterion subject to the observed sample moments or any other function of the data. The objective functions used in all the estimators within IEE are related to Shannon’s entropy (a measure of information)............ Further, all of these measures are entropy divergence measures reflecting a certain entropy distance between two distributions; say a prior and an empirical one. Using these divergence measures form a basis for optimal decision making and for statistical inference. The family of estimators employing these criteria is now known as Information-Theoretic (IT) estimators. The IT family of estimators together with other research-utilizing information measures for analyzing data is naturally connected via the entropy measures used and rests on the foundations of Information Theory. These methods comprise the sub-discipline of IEE."
One way of thinking about IEE is in the context of estimation. We go a step further down the road from Maximum Likelihood estimation, where some strong assumption has to be made about the complete distributional form of the data-generating process (DGP). We go past estimation principles which rely on the optimization of some criterion function that depends on just the moments of the DGP (e.g., Instrumental Variables estimation and GMM). That takes us to Empirical Likelihood methods (Owen, 2001), where the data are used to derive a likelihood function. Finally, we get to the point where we fit a model so as to maximize the entropy, subject only to some moment conditions (Jaynes, 1957a, 1957b).
Importantly, and perhaps not surprisingly, there's an intimate connection between Bayesian inference and IEE, as has been established by Zellner (1988, 1991), and others.
Importantly, and perhaps not surprisingly, there's an intimate connection between Bayesian inference and IEE, as has been established by Zellner (1988, 1991), and others.
If you want to keep an eye on developments in this area of econometrics, then you should follow the activities of The Info-Metrics Institute at The American University, in Washington D.C..
Note: The links to the following references will be helpful only if your computer's IP address gives you access to the electronic versions of the publications in question. That's why a written References section is provided.
References
Golan, A., 2007. Information and entropy econometrics – volume overview and synthesis. Journal of Econometrics, 138, 379-387.
Jaynes, E. T., 1957a. Information theory and statistical mechanics. Physical Review Series II, 106, 620–630.
Jaynes, E. T., 1957b. Information theory and statistical mechanics II. Physical Review Series II, 108, 171–190.
Owen, A. B., 2001. Empirical Likelihood, Chapman Hall/CRC, Boca Raton, FL.
Zellner, A., 1988. Optimal information processing and Bayes’s theorem. The American Statistician, 42, 278–829 (with invited discussion and the author’s reply).
Zellner, A., 1991. Bayesian methods and entropy in economics and econometrics. In: W.T. Grandy, Jr. and
L. H. Schick, Eds., Maximum Entropy and Bayesian Methods, Kluwer, Amsterdam, 17-31.
© 2011, David E. Giles
A very successful recent approach to noisy ill-posed inverse problems is maximizing sparsity (typically called compressive sensing, but this is mainly for historical reasons and is misleading). This work shares a theoretical framework, and methods, with optimal coding (e.g. turbo codes) and with inference in bayesian networks (loopy belief propagation).
ReplyDeleteI think this work largely subsumes and goes beyond maximal entropy.
Do you know of work in econometrics that uses this approach? I'd appreciate pointers.
Jed: Very interesting comment. I'm not familiar with this, and I'm certainly not aware of any applications to econometrics problems.
ReplyDeleteThis looks like maybe the most general introductory treatment of the recent work: http://arxiv.org/abs/0907.3574
ReplyDeleteJed: Thanks! I'll definitely take a look at this.
ReplyDeleteJed : The paper by Donoho et al. is REALLY interesting. I've been playing around with some code and I love the results! Thanks for putting me on to this.
ReplyDeleteThank you for the pointer to the Information and Entropy Econometrics book. You might be interested in this post, which was a response to a musing from the Statistician Andrew Gelman about the use of “free energy” in economics. The connection might not be immediately obvious, but I'll just note that every distribution mentioned is actually a maximum entropy distribution.
ReplyDeletehttp://www.entsophy.net/blog/?p=50
Entsophy - nice post! And I liked your Cauchy post too!
ReplyDelete