Econometrics Beat: Dave Giles' Blog: 03/01/2014

Sunday, March 30, 2014

Understanding the Underlying Asumptions

From time to time I've been known to blog about the importance of fully understanding the assumptions that underlie the various estimators and tests that we use in econometrics. (Here, too.) Gee, I've even gone so far as to suggest that students should learn about these assumptions by taking courses where results are proved formally- not introduced simply through arm-waving.

I'm not going to start griping about all of that again here - it's too nice a Spring day for that.

However, I've just been reading a recent piece in Scientific American that's relevant to my main concern when students are taught "how to do" econometrics, but don't have a proper understanding of the underlying assumptions. That concern is simply that, sooner or later, they'll screw up!

Maybe it won't be the end of the world. The economy probably won't collapse in a big messy heap. Perhaps they'll just lose their job!

The S.A. article was about just this sort of thing - but in the case of neuroscientists, not economists. For the sake of full disclosure I have nothing at all against neuroscientists. In fact, I have a daughter who is doing post-grad. work in just that field at the Florey Institute in Australia.

You can read the article for yourself, and I hope that you will. In a nutshell, there have been numerous influential neuroscience studies, that have appeared in the very top scientific journals, and which have been based on fundamentally flawed statistical analysis.

To put it really simply, the authors have used statistical tests whose validity require that the data have been sampled independently, when in fact this requirement is undeniably violated in these studies.

Oh dear!

To quote Gary Stix, the author of the article:

'Emery N. Brown, a professor of computational neuroscience in the department of brain and cognitive sciences at the MIT-Harvard Division of Health Sciences and Technology, points to a dire need to bolster the level of statistical sophistication brought to bear in neuroscience studies. “There’s a fundamental flaw in the system and the fundamental flaw is basically that neuroscientists don’t know enough statistics to do the right things and there’s not enough statisticians working in neuroscience to help that." '

I'd venture to guess that "screwing up" in the neurosciences might have some unpleasant consequences.

Needless to say, I've sent the link to the S.A. article to my daughter!

You might want to think about this the next time you fire up your favourite econometrics package: Did your friendly econometrics instructor make sure that you really understand the assumptions that need to be satisfied before you can rely on the estimators and tests you're about to use?

Wednesday, March 26, 2014

MCMC for Econometrics Students - IV

This is the fourth in a sequence of posts designed to introduce econometrics students to the use of Markov Chain Monte Carlo (MCMC, or MC²) simulation methods for Bayesian inference. The first three posts can be found here, here, and here, and I'll assume that you've read them already. The emphasis throughout is on the use of the Gibbs sampler.

The first three posts took a look "inside the box", to see that the Gibbs sampler entails. I looked at some R code that could be used to show the sampler "in action". One way to think about those posts is that they were analogous to an explanation of OLS regression, with code that assembled the X matrix, the (X'X) matrix, showed you how to invert the latter matrix, and so on. It's important to understand what is going on when you select "OLS" in your favourite package, but you certainly wouldn't dream of constructing the estimator from first principles every time you wanted to use it.

It's the same in the case of our Bayesian analysis. So, the purpose of this post is to show you simple it is, in practice, to use R to estimate regression models using Bayesian methods, and to implement Bayesian Posterior Odds analysis for model selection. We can just take advantage of the R packages that have been developed already to help us.

Congratulations to My Colleague!

This afternoon I had the pleasure of attending the annual Teaching, Research, and Service Awards ceremony for our Faculty of Social Sciences here at the University of Victoria. Our Dean, Peter Keller, does a great job with this event. I was especially grateful for the food, as I'd missed out on lunch today!

However, the real highlight was the presentation of the Research Award to my colleague, Kees van Kooten. Kees is an outstanding researcher who holds a (Tier 1) Canada Research Chair in our department, and the award was both richly deserved and long overdue.

Congratulations, Kees!

Monday, March 24, 2014

Thumbs Up; Thumbs Down

People say and do the darnedest things!

I'll let you assign your own "thumbs up" and "thumbs down" to the following gems. I imagine you can guess where I stand on each of them!

'But which is a bigger menace to society, laziness about data or laziness about theory? Theory-laziness is seductive because it's easy - mining for correlations isn't very mentally taxing. But data-laziness is seductive because it's hard - the more complicated and intricate a theory you make, the smarter it makes you feel, even if the theory sucks.

In the past, data-laziness was probably more of a threat to humanity. Since systematic data was scarce, people had a tendency to sit around and daydream about how stuff might work. But now that Big Data is getting bigger and computing power is cheap, theory-laziness seems to be becoming more of a menace. The lure of Big Data is that we can get all our ideas from mining for patterns, but A) we get a lot of false patterns that way, and B) the patterns insidiously and subtly suggest interpretations for themselves, and those interpretations are often wrong.'

(Noah Smith in his post, Which is Better, Data or Theory?)

'........ which raises the question "who should be teaching students econometrics?" Should it be someone like ****, who is basically an applied micro guy, or should it be an econometric theorist?'

(Frances Woolley, commenting on her own post)

'Developing statistical methods is hard and often frustrating work. One of the under appreciated rules in statistical methods development is what I call the 80/20 rule (maybe could even by the 90/10 rule). The basic idea is that the first reasonable thing you can do to a set of data often is 80% of the way to the optimal solution. Everything after that is working on getting the last 20%.'

(Jeff Leek, on the Simply Statistics blog)

'The micro stuff that people like myself and most of us do has contributed tremendously and continues to contribute. Our thoughts have had enormous influence. It just happens that macroeconomics, firstly, has been done terribly and, secondly, in terms of academic macroeconomics, these guys are absolutely useless, most of them. Ask your brother-in-law. I’m sure he thinks, as do 90% of us, that most of what the macro guys do in academia is just worthless rubbish. Worthless, useless, uninteresting rubbish, catering to a very few people in their own little cliques.'

(Chris Auld, reputedly quoting someone else, in a blog post from 2011)

'The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.'

(John Tukey)

'So, we produce our papers, as if on a relentless production line. We cannot wait for inspiration; we must maintain our output. To do our jobs successfully, we need to acquire a fundamental academic skill that the scholars of old generally did not possess; modern academics must be able to keep writing and publishing even when they have nothing to say. ....'

(Michael Billig, as quoted by Timothy Taylor)

Sunday, March 23, 2014

Data Transfer Advice From Francis Smart

I always enjoy reading the posts by Francis Smart on his Econometrics by Simulation blog. A couple of days ago he wrote a nice piece titled, "It is Time for RData Files to Become the Standard for Data Transfer".

Francis made some very good points about the handling of large amounts of data, and he provided some convincing examples regarding the compression rates and opening times for RData files as compared with other options. The comments to his post are also very relevant.

If you're using or exchanging large data files, you'll find Francis's post most helpful.

Friday, March 21, 2014

Death of A. L. Nagar

I was saddened to learn that Anirudh Lal Nagar passed away on 4 February 2014.

Nagar was an exceptional Indian statistician and econometrician who made many fundamental contributions to our discipline. He was 83 years old at the time of his death.

Nagar (left) at the inauguration of the laboratory named in his honour at Jawaharlal Nehru University.

Nagar's work on the finite-sample inference in econometrics is especially well known. The term "the Nagar expansion" was coined by Denis Sargan (1974). This technique, proposed by Nagar in 1959 for the k-class estimators, was widely used to determine the sampling properties (bias, variance, etc.) of various simultaneous equations estimators. It came at a time when large-n asymptotics dominated the scene, and finite-sample results were deemed to be "intractable".

Nagar's work influenced a generation of theoretical econometricians, and paved the way for some of the most important results established in our discipline. The impact of his work is reflected in the volume of papers assembled to honour him on is sixtieth birthday (Carter et al., 1990).

He will be greatly missed.

References

Carter, R. A. L., J. Dutta, and A. Ullah (eds.), 1990. Contributions to Econometric Theory and Applications: Essays in Honour of A. L. Nagar. Wiley, New York. (Softcover reprint.)

Nagar, A. L., 1959. The bias and moment matrix of the general k-class estimators of the parameters in simultaneous equations. Econometrica, 27, 573-595.

Sargan, J. D., 1974. The validity of Nagar's expansion for the moments of econometric estimators. Econometrica, 42, 169-176.

Wednesday, March 19, 2014

MCMC for Econometrics Students - III

As its title suggests, this post is the third in a sequence of posts designed to introduce econometrics students to the use of Markov Chain Monte Carlo (MCMC, or MC²) methods for Bayesian inference. The first two posts can be found here and here, and I'll assume that you've read both of them already.

We're going to look at another example involving the use of the Gibbs sampler. Specifically, we're going to use it to extract the marginal posterior distributions from the joint posterior distribution, in a simple two-parameter problem. The problem - which we'll come to shortly - is one in which we actually know the answer in advance. That's to say, the marginalizing can be done analytically with some not-too-difficult integration. This means that we have a "bench mark" against which to judge the results generated by the Gibbs sampler.

Let's look at the inference problem we're going to solve.

MCMC for Econometrics Students - II

This is the second in a set of posts about Monte Carlo Markov Chain (MCMC, or MC²) methods in Bayesian econometrics. The background was provided in this first post, where the Gibbs sampler was introduced.

The main objective of the present post is to convince you that this MCMC stuff actually works!

To achieve this, what we're going to do is work through a simple example - one for which we actually know the answer in advance. That way, we'll be able to check our results from applying the Gibbs sampler with the facts. Hopefully, we'll then be able to see that this technique works - at least for this example!

I'll be using some R script that I've written to take students through this, and it's available on the code page for this blog. I should mention in advance that this code is not especially elegant. It's been written, quite deliberately, in a step-by-step manner to make it relatively transparent to non-users of R. Hopefully, the comments that are embedded in the code will also help.

It's also important to note that this first illustration of the Gibbs sampler in action does not involve the posterior distribution for the parameters in a Bayesian analysis of some model. Instead, we're going to look at the problem of obtaining the marginals of a bivariate normal distribution, when we know the form of the conditional distributions.

In other words - let's proceed one step at a time. The subsequent posts on this topic will be dealing with Bayesian posterior analysis.

Let's take a look at the set-up, and the analysis that we're going to undertake.

MCMC for Econometrics Students - I

This is the first of a short sequence of posts that discuss some material that I use when teaching Bayesian methods in my graduate econometrics courses.

This material focuses on Markov Chain Monte Carlo (MCMC) methods - especially the use of the Gibbs sampler to obtain marginal posterior densities. This first post discusses some of the computational issues associated with Bayesian econometrics, and introduces the Gibbs sampler. The follow-up posts will illustrate this technique with some specific examples.

So, what's the computational issue here?

A New Statistics Journal

A big hat-tip to Rob Hyndman for (indirectly) alerting me to an interesting new statistics journal: The Annual Review of Statistics and its Application.

There are some terrific review articles in the first issue, and several of these are "must-reads" for students of econometrics and practising econometricians.

I especially like:

Stephen E. Fienberg, What is Statistics?
Tilmann Gneiting and Matthias Katzfuss, Probabilistic Forecasting.
Christian P. Robert, Bayesian Computational Tools.
Radu V. Craiu and Jeffrey S. Rosenthal, Bayesian Computation via Markov Chain Monte Carlo.

Saturday, March 15, 2014

No Pressure Here

This post might make me sound a little grumpy. I hope not. Anyway, here goes.

Comments that are posted on this blog come to me by email for "approval" prior to posting. This is standard practice, and believe me, you wouldn't want t see some of the spam that people try to post as "comments".

Among the comments waited to be vetted this morning were two at opposite ends of the non-spam spectrum.

One was a grateful and thoughtful comment from "Tom" on my post, ARDL Models - Part I . Here's the exchange in full:

Research on the Interpretation of Confidence Intervals

Like a lot of others, I follow Andrew Gelman's blog with great interest, and today I was especially pleased to see this piece relating to a recent study on the extent to which researchers do or do not interpret confidence intervals correctly.

If you've ever taught an introductory curse on statistical inference (from a frequentist, rather than Bayesian perspective), then I don't need to tell you how difficult it can be for students to really understand what a confidence interval is, and (perhaps more importantly) what it isn't!

It's not only students who have this problem. Statisticians acting as "expert witnesses" in court cases have no end of trouble getting judges to understand the correct interpretation of a confidence interval. And I'm sure we've all seen or heard empirical researchers misinterpret confidence results! For a specific example of the latter, involving a subsequent Nobel laureate, see my old post here!

The study that's mentioned by Andrew today was conducted by four psychologists (Hoekstra et al., 2014) and involved a survey of academic psychologists at three European Universities. The participants included 442 Bachelor students, 34 Master students, and 120 researchers (Ph.D. or faculty members).

Yes, the participants in this survey are psychologists, but we won't hold that against them, and my hunch is that if we changed "psychologist" to "economist" the results wouldn't alter that much!

Before summarizing the findings of this study, let's see what the authors have to say about the correct interpretation of a confidence interval (CI) constructed from a particular sample of data:

Seminars by the Number - Redux

In my second post on this blog, just over three years ago, I took a shot at seminars - economics seminars in particular. There's nothing there that I want to retract. I still remain bemused by the duration of economics seminars; the time that's wasted on details rather than "the big picture"; and the proportion of the allotted time that's taken up with audience "participation".

This being the case, I thought I'd update my earlier suggestion for streamlining these seminars. The focus is on seminars of an econometric nature - very occasionally we actually do have such events in my department.

Here's what I suggested in that earlier post:

Testing for Multivariate Normality

In a recent post I commented on the connection between the multivariate normal distribution and marginal distributions that are normal. Specifically, the latter do not necessarily imply the former.

So, let's think about this in terms of testing for normality.

Suppose that we have several variables which we think may have a joint distribution that's normal. We could test each of the variables for normality, separately, perhaps using the Jarque-Bera LM test. If the null hypothesis of normality was rejected for one or more of the variables, this could be taken as evidence against multivariate normality. However, if normality couldn't be rejected for any of the variables, this wouldn't tell us anything about their joint distribution.

What we need is a test for multivariate normality itself. Let's see what's available.

March Madness in the Reading Department

It's time for the monthly round-up of recommended reading material.

Gan, L. and J. Jiang, 1999. A test for global maximum. Journal of the American Statistical Association, 94, 847-854.
Nowak-Lehmann, F., D. Herzer, S. Vollmer, and I. Martinez-Zarzosa, 2006. Problems in applying dynamic panel data models: Theoretical and empirical findings. Discussion Paper Nr. 140, IAI, Georg-August-Universität Göttingen.
Olive, D. J., 2004. Does the MLE maximize the likelihood? Mimeo., Department of Mathematics, Southern Illinois University.
Pollock, D. S. G., 2014. Econometrics: An historical guide for the uninitiated. Working Paper No. 14/05, Department of Economics, University of Leicester.
Terrell, G. R., 2002. The gradient statistic. Interface 2002: Computing Science and Statistics, Vol. 34.
Wald, A., 1940. The fitting of straight lines if both variables are subject to error. Annals of Mathematical Statistics, 11, 284-300.

Econometrics Beat: Dave Giles' Blog

Pages