Wednesday, December 31, 2014

Econometricians' Debt to Alan Turing

The other day, Carol and I went with friends to see the movie, The Imitation Game. I definitely recommend it.

I was previously aware of many of Alan Turing's contributions, especially in relation to the Turing Machine, cryptography, computing, and artificial intelligence. However, I hadn't realized the extent of Turing's use of, and contributions to, a range of important statistical tools. Some of these tools have a direct bearing on Econometrics.

For example:

(HT to Lief Bluck for this one.) In 1935, at the tender age of 22, Turing was appointed a Fellow at King's College, Cambridge, on the basis of his 1934 (undergraduate) thesis in which he proved the Central Limit Theorem. More specifically, he derived a proof of what we now call the Lindeberg-Lévy Central Limit Theorem. He was not aware of Lindeberg's earlier work (1920-1922) on this problem. Lindeberg, in turn, was unaware of Lyapunov's earlier results. (Hint: there was no internet back then!). How many times has your econometrics instructor waved her/his arms and muttered ".......as a result of the central limit theorem....."?
In 1939, Turing developed what Wald and his collaborators would later call "sequential analysis". Yes, that's Abraham Wald who's associated with the Wald tests that you use all of the time.Turing's wartime work on this subject remained classified until the 1980's. Wald's work became well-established in the literature by the late 1940's, and was included in the statistics courses that I took as a student in the 1960's. Did I mention that Wald's wartime associates included some familiar names from economics? Namely, Trygve Haavelmo, Harold Hotelling, Jacob Marschak, Milton Friedman, W. Allen Wallis, and Kenneth Arrow.
The mathematician/statistician I. J. ("Jack") Good was a member of Turing's team at Bletchley Park that cracked the Enigma code. Good was hugely influential in the development of modern Bayesian methods, many of which have found their way into econometrics. He described the use of Bayesian inference in the Enigma project in his "conversation" with Banks (1996). (This work also gave us the Good-Turing estimator - e.g., see Good, 1953.)
Turing (1948) devised the LU ("Lower and Upper") Decomposition that is widely used for matrix inversion and for solving systems of linear equations. Just think how many times you invert matrices when you're doing your econometrics, and how important it is that the calculations are both fast and accurate!

Added, 20 February, 2015: I have recently become aware of Good (1979)

References

Banks, D. L., 1996. A conversation with I. J. Good. Statistical Science, 11, 1-19.

Good, I. J., 1953.The population frequencies of species and the estimation of population parameters. Biometrika, 40, 237-264.

Good, I. J., 1979. A. M. Turing's statistical work in World War II. Biometrika, 66, 393-396.

Turing, A. M., 1948. Rounding-off errors in matrix processes. Quarterly Journal of Mechanics and Applied Mathematics, 1, 287-308.

Monday, December 29, 2014

Multivariate Medians

I'll bet that in the very first "descriptive statistics" course you ever took, you learned about measures of "central tendency" for samples or populations, and these measures included the median. You no doubt learned that one useful feature of the median is that, unlike the (arithmetic, geometric, harmonic) mean, it is relatively "robust" to outliers in the data.

(You probably weren't told that J. M. Keynes provided the first modern treatment of the relationship between the median and the minimization of the sum of absolute deviations. See Keynes (1911) - this paper was based on his thesis work of 1907 and 1908. See this earlier post for more details.)

At some later stage you would have encountered the arithmetic mean again, in the context of multivariate data. Think of the mean vector, for instance.

However, unless you took a stats. course in Multivariate Analysis, most of you probably didn't get to meet the median in a multivariate setting. Did you ever wonder why not?

One reason may have been that while the concept of the mean generalizes very simply from the scalar case to the multivariate case, the same is not true for the humble median. Indeed, there isn't even a single, universally accepted definition of the median for a set of multivariate data!

Let's take a closer look at this.

Econometrics in the Post-Cowles Era

My thanks to Olav Bjerkholt for alerting me to a special edition of the open access journal, Oekonomia, devoted to the History of Econometrics. Olav recently guest-edited this issue, and here's part of what he has to say in the Editor's Foreword:

"Up to World War II there were competing ideas, approaches, and multiple techniques in econometrics but no ruling paradigm. Probability considerations played a very minor role in econometric work due to an unfounded but widely accepted view that economic time series were not amenable to such analysis. The formalization of econometrics undertaken at Cowles Commission in Chicago in the late 1940s inspired by the Probability Approach of Trygve Haavelmo, and often referred to as the CC-Haavelmo paradigm, placed the whole problem of determining economic relationships firmly within a probabilistic framework and made most traditional techniques redundant. A key assumption in this paradigm as it was conceived is that models to be estimated have been fixed with certainty by a priori formulated theories alone, it can thus be labeled as “theory-oriented”. It exerted a strong influence in the ensuing years, not least as consolidated standard econometrics propagated by textbooks. The history of econometrics, as written in the 1980s and 1990s, covered mainly the period up to and including the Cowles Commission econometric achievements.

Haavelmo made a remark at the beginning of his influential monograph that econometric research aimed at connecting economic theory and actual measurements, using appropriate tools as a bridge pier, “[b]ut the bridge itself was never completely built.” From around 1960 there arose increasingly discontents of different kinds with the CC-Haavelmo paradigm, not least because of the key assumption mentioned above. The bridge needed mending but the ideas of how to do it went in different directions and led eventually to developments of new paradigms and new directions of econometric analysis. This issue comprises four articles illuminating developments in econometrics in the post-Cowles era."

These four articles are:

Marc Nerlove, “Individual Heterogeneity and State Dependence: From George Biddell Airy to James Joseph Heckman”.
Duo Qin, “Inextricability of Confluence and Autonomy in Econometrics”.
Aris Spanos, “Reflections on the LSE Tradition in Econometrics: a Student’s Perspective”.
Nalan Basturk, Cem Cakmakli, S. Pinar Ceyhan, and Herman van Dijk, “Historical Developments in Bayesian Econometrics after Cowles Foundation Monographs 10, 14”.

Coincidentally, the last of these papers was the topic of another post of mine last month, before I was aware of this special journal issue. I'm looking forward to reading the other three contributions. If they're even half as good as the one by Basturk et al., I'm in for a treat!

Saturday, December 27, 2014

The Demise of a "Great Ratio"

Once upon a time there was a rule of thumb that there were 20 sheep in New Zealand for every person living there. Yep, I kid you not. The old adage used to be "3 million people; 60 million sheep".

I liked to think of this as another important "Great Ratio". You know - in the spirit of the famous "Great Ratios" suggested by Klein and Kosubod (1961) in the context of economic growth, and subsequently analysed and augmented by a variety of authors. The latter include Simon (1990), Harvey et al. (2003), Attfield and Temple (2010), and others.

After all, it's said that (at least in the post-WWII era) the economies of both Australia and New Zealand "rode on the sheep's back". If that's the case, then the New Zealand Sheep Ratio (NZSR) may hold important clues for economic growth in that country.

My interest in this matter right now comes from reading an alarming press release from Statistics New Zealand, a few days ago. The latest release of the Agricultural Production Statistics for N.Z. revealed that the (provisional) figure for the number of sheep was (only!) 26.9 million at the end of June 2014 - down 4% from 2013.

I was shocked, to say the least! Worse was to come. The 2014 figure puts the number of sheep in N.Z. at the lowest level since 1943!

I'm sure you can understand my concern. We'd better take a closer look at this, and what it all means for the NZSR:

End-of-Semester Econometrics Examination

My introductory graduate econometrics class has just finished up. The students sat the final examination yesterday. They did really well!

If you'd like to try your hand, you can find the exam. here.

(Added later: A rough solution is available here.)

Sunday, December 14, 2014

The Rotterdam Model

Ken Clements (U. Western Australia) has sent me a copy of his paper, co-authored with Grace Gao this month, "The Rotterdam Demand Model Half a Century On".

How appropriate it is to see this important landmark in econometrics honoured in this way. And how fitting that this paper is written by two Australian econometricians, given the enormous contributions to empirical demand analysis that have come from that group of researchers - including Ken and his many students - over the years. (But more on this another time.)

Any student who wants to see applied econometrics at its best can do no better than look at the rich empirical literature on consumer demand. That literature will take you beyond the "toy" models that you meet in your micro. courses, to really serious ones: the Linear Expenditure System, the Rotterdam Model, the Almost Ideal Demand System, and others. Where better to see the marriage of sound economic modelling, interesting data, and innovative statistical methods? In short - "econometrics".

Back to Ken and Grace's paper, though. Here's the abstract:

When Did You Last Check Your Code?

Chris Blattman (Columbua U.) has a blog directed towards international development, economics, politics, and policy.

In a post yesterday, Chris asks: "What happens when a very good political science journal checks the statistical code of its submissions?"

The answer is not pretty!

His post relates to the practice of the Quarterly Journal of Political Science to subject empirical papers to in-house replication. This involves running the code provided by authors. He cites a batch of 24 such papers in which only 4 were found to be error-free.

Have you checked you own code recently?

Friday, December 12, 2014

"The Error Term in the History of Time Series Econometrics"

While we're on the subject of the history of econometrics ......... blog-reader Mark Leeds kindly drew my attention to this interesting paper published by Duo Qin and Christopher Gilbert in Econometric Theory in 2001.

I don't recall reading this paper before - my loss.

Mark supplied me with a pre-publication version of the paper, which you can download here if you don't have access to Econometric Theory.

Here's the abstract:

"We argue that many methodological confusions in time-series econometrics may be seen as arising out of ambivalence or confusion about the error terms. Relationships between macroeconomic time series are inexact and, inevitably, the early econometricians found that any estimated relationship would only fit with errors. Slutsky interpreted these errors as shocks that constitute the motive force behind business cycles. Frisch tried to dissect further the errors into two parts: stimuli, which are analogous to shocks, and nuisance aberrations. However, he failed to provide a statistical framework to make this distinction operational. Haavelmo, and subsequent researchers at the Cowles Commission, saw errors in equations as providing the statistical foundations for econometric models, and required that they conform to a priori distributional assumptions specified in structural models of the general equilibrium type, later known as simultaneous-equations models (SEM). Since theoretical models were at that time mostly static, the structural modelling strategy relegated the dynamics in time-series data frequently to nuisance, atheoretical complications. Revival of the shock interpretation in theoretical models came about through the rational expectations movement and development of the VAR (Vector AutoRegression) modelling approach. The so-called LSE (London School of Economics) dynamic specification approach decomposes the dynamics of modelled variable into three parts: short-run shocks, disequilibrium shocks and innovative residuals, with only the first two of these sustaining an economic interpretation."

More on the History of Econometrics From Olav Bjerkholt

If you look back at the various posts on this blog in the category of History of Econometrics, you'll find that I've often mentioned papers written by Olav Bjerkholt, of the University of Oslo.

Olav has drawn my attention to two more recent papers of his. They're titled, "Econometric Society 1930: How it Got Founded", and "Glimpses of Henry Schultz in Mussolini's Italy 1934". The second of these is co-authored with Daniela Parisi.

Here's the abstract from the first paper:

"The Econometric Society was founded at an “organization meeting” in December 1930. The invitations had been issued by Irving Fisher, Ragnar Frisch and, Charles F. Roos. In June the same year they had sent a form letter to a list of 31 scholars to solicit advice about establishing an international association “to help in gradually converting economics into a genuine and recognized science.” The responses of these scholars from ten different countries are set out at some length in the paper. Rather than persevering in building a constituency of adherents on which a society could be founded the three initiators decided to rush ahead and sent out invitations to an organization meeting to found the Econometric Society at short notice. The paper discusses possible reasons for the change of pace, indicating that Schumpeter had a decisive role, and gives an account of the deliberations of the organization meeting founding the Econometric Society."

The second paper covers material that I was previously quite unaware of. Here's the abstract:

"Professor of Economics at the University of Chicago, Henry Schultz, spent a sabbatical year in Europe in 1933/34 working on his forthcoming monograph The Theory and Measurement of Demand (Schultz 1938). During the year he found time to travel in 6-7 countries meeting economists and other scholars. The article describes and comments his seven weeks long visit to Italy in March-April 1934. The glimpses of Henry Schultz in Italy are provided by Schultz’s own brief diary notes during that visit. Henry Schultz was a prominent member of the Econometric Society and had been present at the organization meeting of the Society in 1930. In Italy he met with practically all the leading econometricians in Italy. Schultz was particularly interested in the stand taken by Italian economists on Mussolini’s Corporate State and also in the situation of Jews under fascism. Schultz followed a tourist trail in Italy visiting also Roman and Etruscan remains and a number of places of Christian worship."

My thanks to Olav for alerting me to these two fascinating papers.

Thursday, December 11, 2014

New Features in EViews 9 (Beta)

When you get a chance to check out the "beta" release of EViews 9 (which current users can download from here), you'll find lots of new features.

Many of these relate to the Eviews interface, data handling, and graphs and tables. And then (of course) there are lots of new Econometrics goodies! To summarize them, under the headings used in the documentation:

Computation
• Automatic ARIMA forecasting of a series
• Forecast evaluation and combination testing
• Forecast averaging
• VAR Forecasting

Estimation
• Autoregressive Distributed Lag regression (ARDL) with automatic lag selection
• ML and GLS ARMA estimation
• ARFIMA models
• Pooled mean group estimation of panel data ARDL models
• Threshold regression
• New optimization engine

Testing and Diagnostics
• Unit root tests with a structural break
• Cross-section Dependence Tests
• Panel Effects Tests

I just had to highlight ARDL models. My earlier posts on these models (here and here) attracted a lot of readers, and many questions and comments.

I've been promising a follow-up post on this topic for some time. You can guess why I've been holding off, and what one of my upcoming posts will be about!

Two Non-Problems!

I just love Dick Startz's "byline" on the EViews 9 Beta Forum:

"Non-normality and collinearity are NOT problems!"

Why do I like it so much? Regarding "normality", see here, and here. As for "collinearity": see here, here, here, and here

EViews 9 - Beta Version

If you're an EViews user then you'll be delighted to learn that the beta version of the new EViews 9 was released this morning.

Provided that you have EViews 8.1 on your machine, you can download the beta version of this latest release from http://register1.eviews.com/beta/.

Here's what you'll see:

There are no delays - just make sure that you know if you need the 32-bit or 64-bit version.

I've had the opportunity to play around with the alpha version of EViews 9 over the past few weeks (thanks, Gareth!), and I can assure you that there are lots of really nice goodies in store for you.

I'll be devoting a few posts to some of the new features over the next short while, so stay tuned.

Monday, December 8, 2014

Marc Bellemare on Social Media

Marc Bellemare has been catching my attention recently. On Saturday I had a post that mentioned his talk on "How to Publish Academic Papers". I know that a lot of you have followed this up already.

Today, I just have to mention another of his talks, given last Friday, titled "Social Media for (Academic) Economists". Check out his blog post about the talk, and then look at this slides that are linked there.

Yep, I agree with pretty much everything he has to say. And nope, we're not related!

Sunday, December 7, 2014

"Mastering 'Metrics"

Mastering 'Metrics: The Path from Cause to Effect, by Joshua Angrist and Jörn-Steffen Pischke, is to be published by Princeton University Press later this month. This new book from the authors of Mostly Harmless Econometrics: An Empiricist's Companion is bound to be well received by students and researchers involved in applied empirical economics. My guess is that the biggest accolades will come from those whose interest is in empirical microeconomics.

You can download and preview the Introduction and Chapter 1.

Apparently the book focuses on:

"The five most valuable econometric methods, or what the authors call the Furious Five - random assignment, regression, instrumental variables, regression discontinuity designs, and differences in differences."

If this sounds interesting to you, then make sure that you take a look at Peter Dizikes' recent post, "How to Conduct Social Science Research", on the World Economic Forum website.

Saturday, December 6, 2014

Advice on Publishing

I've put in a lot of time over the years as an Editor, Associate Editor, or Editorial Board member, for a number of economics and statistics journals, ranging from Journal of Econometrics and Econometric Theory, to Journal of International Trade & Economic Development. I've also refereed more papers than care to think about.

Students, rightly, are eager to get the scoop on how to get their work published in good journals. They often talk to me about this. My suggestion would be to read, and follow the advice given by Marc Bellemare in his talk, "How to Publish Academic Papers".

Just do it!

(HT to David Stern for unwittingly making me aware of Marc's talk,)

© 2014, David E. Giles

Thursday, December 4, 2014

More on Prediction From Log-Linear Regressions

My therapy sessions are actually going quite well. I'm down to just one meeting with Jane a week, now. Yes, there are still far too many log-linear regressions being bandied around, but I'm learning to cope with it!

Last year, in an attempt to be helpful to those poor souls I had a post about forecasting from models with a log-transformed dependent variable. I felt decidedly better after that, so I thought I follow up with another good deed.

Let's see if it helps some more:

Statistical Controls Are Great - Except When They're Not!

A blog post today, titled, How Race Discrimination in Law Enforcement Actually Works", caught my eye. Seemed like an important topic. The post, by Ezra Klein, appeared on Vox.

I'm not going to discuss it in any detail, but I think that some readers of this blog will enjoy reading it. Here are a few selected passages, to whet your collective appetite:

"You see it all the time in studies. "We controlled for..." And then the list starts. The longer the better." (Oh boy, can I associate with that. Think of all of those seminars you've sat through.......)

"The problem with controls is that it's often hard to tell the difference between a variable that's obscuring the thing you're studying and a variable that is the thing you're studying."

"The papers brag about their controls. They dismiss past research because it had too few controls." (How many seminars was that?)

"Statistical Controls Are Great - Except When They're Not!"

Here's Your Reading List!

As we count the year down, there's always time for more reading!

Birg, L. and A. Goeddeke, 2014. Christmas economics - A sleigh ride. Discussion Paper No. 220, CEGE, University of Gottingen.
Geraci, A., D. Fabbri, and C. Monfardini, 2014. Testing exogeneity of multinomial regressors in count data models: Does two stage residual inclusion work? Working Paper 14/03, Health, Econometrics and Data Group, University of York.
Li, Y. and D. E. Giles, 2014. Modelling volatility spillover effects between developed stock markets and Asian emerging stock markets. International Journal of Finance and Economics, in press.
Ma, J. and M. Wohar, 2014. Expected returns and expected dividend growth: Time to rethink an established literature. Applied Economics, 46, 2462-2476.
Qin, D., 2014. Resurgence of instrument variable estimation and fallacy of endogeneity. Economics Discussion Papers No. 2014-42, Kiel Institute for the World Economy.
Romano, J. P. and M. Wolf, 2014. Resurrecting weighted least squares. Working Paper No. 172, Department of Economics, University of Zurich.
Tchatoka, F.D., 2014. Specification tests with weak and invalid instruments. Working Paper No. 2014-05, School of Economics, University of Adelaide.

Friday, November 28, 2014

The A. R. Bergstrom Prize, 2015

Tuesday, November 25, 2014

Thanks for Downloading!

In an earlier post I mentioned a paper that I co-authored with Xiao Ling. The paper is "Bias reduction for the maximum likelihood estimator of the parameters of the generalized Rayleigh family of distributions. Communications in Statistics - Theory and Methods, 2014, 43, 1778-1792.

Over the period January to July 2014, this paper was downloaded 144 times from the journal's website. That made it the 6th most downloaded paper for that period - out of all papers downloaded from all volumes/issues of Communications in Statistics - Theory and Methods.

My guess is that some of you were responsible for this. Thanks!

Wednesday, November 19, 2014

The Rise of Bayesian Econometrics

A recent discussion paper by Basturk et al. (2014) provides us with (at least) two interesting pieces of material. First, they give a very nice overview of the origins of Bayesian inference in econometrics. This is a topic dear to my heart, given that my own Ph.D. dissertation was in Bayesian Econometrics; and I began that work in early 1973 - just two years after the appearance of Arnold Zellners' path-breaking book (Zellner, 1971).

Second, they provide an analysis of how the associated contributions have been clustered, in terms of the journals in which they have been published. The authors find, among other things, that:

"Results indicate a cluster of journals with theoretical and applied papers, mainly consisting of Journal of Econometrics, Journal of Business and Economic Statistics, and Journal of Applied Econometrics which contains the large majority of high quality Bayesian econometrics papers."

A couple of the paper coming out of my dissertation certainly fitted into that group - Giles (1975) and Giles and Rayner (1979).

The authors round out their paper as follows:

"...with a list of subjects that are important challenges for twenty-first century Bayesian conometrics: Sampling methods suitable for use with big data and fast, parallelized and GPU, calculations, complex models which account for nonlinearities, analysis of implied model features such as risk and instability, incorporating model incompleteness, and a natural combination of economic modeling, forecasting and policy interventions."

So, there's lots more to be done!

References

Basturk, N., C. Cacmakli, S. P. Ceyhan, and H. K. van Dijk, 2014. On the rise of Bayesian econometrics after Cowles Foundation monographs 10 and 14. Tinbergen Institute Discussion Paper TI 2014-085/III.

Giles, D.E.A., 1975. Discriminating between autoregressive forms: A Monte Carlo comparison of Bayesian and ad hoc methods”, Journal of Econometrics, 3, 229-248.

Giles, D.E.A.and A.C. Rayner, 1979. The mean squared errors of the maximum likelihood and natural-conjugate Bayes regression estimators”, Journal of Econometrics, 11, 319-334.

Zellner, A., 1971. An Introduction to Bayesian Inference in Econometrics. Wiley, New York.

Sunday, November 16, 2014

Orthogonal Regression: First Steps

When I'm introducing students in my introductory economic statistics course to the simple linear regression model, I like to point out to them that fitting the regression line so as to minimize the sum of squared residuals, in the vertical direction, is just one possibility.

They see, easily enough, that squaring the residuals deals with the positive and negative signs, and that this prevents obtaining a "visually silly" fit through the data. Mentioning that one could achieve this by working with the absolute values of the residuals provides the opportunity to mention robustness to outliers, and to link the discussion back to something they know already - the difference between the behaviours of the sample mean and the sample median, in this respect.

We also discuss the fact that measuring the residuals in the vertical ("y") direction is intuitively sensible, because the model is purporting to "explain" the y variable. Any explanatory failure should presumably be measured in this direction. However, I also note that there are other options - such as measuring the residuals in the horizontal ("x") direction.

Perhaps more importantly, I also mention "orthogonal residuals". I mention them. I don't go into any details. Frankly, there isn't time; and in any case this is usually the students' first exposure to regression analysis and they have enough to be dealing with. However, I've thought that we really should provide students with an introduction to orthogonal regression - just in the simple regression situation - once they've got basic least squares under their belts.

The reason is that orthogonal regression comes up later on in econometrics in more complex forms, at least for some of these students; but typically they haven't seen the basics. Indeed, orthogonal regression is widely used (and misused - Carroll and Ruppert, 1966) to deal with certain errors-in-variables problems. For example, see Madansky (1959).

That got me thinking. Maybe what follows is a step towards filling this gap.

Cointegration - The Definitive Overview

Recently released, this discussion paper from Søren Johansen, will give you the definitive overview of cointegration that you've been waiting for.

Titiled simply, "Time Series: Cointegration", Johansen's paper has been prepared for inclusion in the 2nd. edition of The International Encyclopedia of the Social and Behavioural Sciences, 2014. In the space of just sixteen pages, you'll find pretty much everything you need or want to know about cointegration.

To get you started, here's the abstract:

"An overview of results for the cointegrated VAR model for nonstationary I(1) variables is given. The emphasis is on the analysis of the model and the tools for asymptotic inference. These include: formulation of criteria on the parameters, for the process to be nonstationary and I(1), formulation of hypotheses of interest on the rank, the cointegrating relations and the adjustment coefficients. A discussion of the asymptotic distribution results that are used for inference. The results are illustrated by a few examples. A number of extensions of the theory are pointed out."

Enjoy!

Tuesday, November 11, 2014

Normality Testing & Non-Stationary Data

Bob Jensen emailed me about my recent post about the way in which the Jarque-Bera test can be impacted when temporally aggregated data are used. Apparently he publicized my post on the listserv for Accounting Educators in the U.S.. He also drew my attention to a paper from Two former presidents of the AAA: "Some Methodological Deficiencies in Empirical Research Articles in Accounting", by Thomas R. Dyckman and Stephen A. Zeff, Accounting Horizons, September 2014, 28 (3), 695-712. (Here.)

Bob commented that an even more important issue might be that our data may be non-stationary. Indeed, this is always something that should concern us, and regular readers of this blog will know that non-stationary data, cointegration, and the like have been the subject of a lot of my posts.

In fact, the impact of unit roots on the Jarque-Bera test was mentioned in this old post about "spurious regressions". There, I mentioned a paper of mine (Giles, 2007) in which I proved that:

Read Before You Cite!

Note to self - file this post in the "Look Before You Leap" category!

Looking at The New Zealand Herald newspaper this morning, this headline caught my eye:

"How Did Sir Owen Glenn's Domestic Violence Inquiry Get $7 Billion Figure Wrong?"

$7 Billion? Even though that's (only) New Zealand dollars, it still sounds like a reasonable question to ask, I thought. And (seriously) this is a really important issue, so, I read on.

Here's part of what I found (I've added the red highlighting):

Reverse Regression Follow-up

At the end of my recent post on Reverse Regression, I posed three simple questions - homework for the students among you, if you will.

Here they are again, with brief "solutions":

A Source of Irritation

I very much liked one of ECONJEFF's posts last week, titled "Epistemological Irritation of the Day".

The bulk of it reads:

" "A direct test of the hypothesis is looking for significance in the relationship between [one variable] and {another variable]."

No, no, no, no, no. Theory makes predictions about signs of coefficients, not about significance levels, which also depend on minor details such as the sample size and the amount of variation in the independent variable of interest present in the data."

He was right to be upset - and see his post for the punchline!

Saturday, November 8, 2014

Econometric Society World Congress

Every five years, the Econometric Society holds a World Congress. In those years, the usual annual European, North American, Latin American, and Australasian meetings are held over.

The first World Congress was held in Rome, in 1960. I've been to a few of these gatherings over the years, and they're always great events.

The next World Congress is going to be held in Montréal, Canada, in August of 2015. You can find all of the details right here.

Something to look forward to!

A Reverse Regression Inequality

Suppose that we fit the following simple regression model, using OLS:

y_i = βx_i + ε_i . (1)

To simplify matters, suppose that all of the data are calculated as deviations from their respective sample means. That's why I haven't explicitly included an intercept in (1). This doesn't affect any of the following results.

The OLS estimator of β is, of course,

b = Σ(x_iy_i) / Σ(x_i²) ,

where the summations are for i = 1 to n (the sample size).

Now consider the "reverse regression":

x_i = αy_i + u_i . (2)

The OLS estimator of α is

a = Σ(x_iy_i) / Σ(y_i²).

Clearly, a ≠ (1 / b), in general. However, can you tell if a ≥ (1 / b), or if a ≤ (1 / b)?

The answer is, "yes", and here's how you do it.

The Econometrics of Temporal Aggregation V - Testing for Normality

This post is one of a sequence of posts, the earlier members of which can be found here, here, here, and here. These posts are based on Giles (2014).

Some of the standard tests that we perform in econometrics can be affected by the level of aggregation of the data. Here, I'm concerned only with time-series data, and with temporal aggregation. I'm going to show you some preliminary results from work that I have in progress with Ryan Godwin. Although these results relate to just one test, our work covers a range of testing problems.

I'm not supplying the EViews program code that was used to obtain the results below - at least, not for now. That's because what I'm reporting is based on work in progress. Sorry!

As in the earlier posts, let's suppose that the aggregation is over "m" high-frequency periods. A lower case symbol will represent a high-frequency observation on a variable of interest; and an upper-case symbol will denote the aggregated series.

So,

Y_t = y_t + y_{t - 1} + ......+ y_{t - m + 1} .

If we're aggregating monthly (flow) data to quarterly data, then m = 3. In the case of aggregation from quarterly to annual data, m = 4, etc.

Now, let's investigate how such aggregation affects the performance of the well-known Jarque-Bera (1987) (J-B) test for the normality of the errors in a regression model. I've discussed some of the limitations of this test in an earlier post, and you might find it helpful to look at that post (and this one) at this point. However, the J-B test is very widely used by econometricians, and it warrants some further consideration.

Consider the following small Monte Carlo experiment.

The Village Idiot Hypothesis

Yesterday, I received an email from Michael Belongia (Economics, U. Mississippi). With it, he kindly sent a copy of the Presidential Address to the American Agricultural Economics Association in 1979. The talk, given by Richard A. King, was titled "Choices and Consequences". It makes interesting reading, and many of the points that King makes are just as valid today as they were in 1979.

He has a lot to say about empirical consumer demand studies, especially as they relate to agricultural economics. In particular, he's rightly critical of the very restrictive characteristics of the Linear Expenditure System (Stone, 1954), and the Rotterdam Model (Theil, 1975). However, many of the objections that King raised were overcome just a year later with the "Almost Ideal Demand System" introduced by Deaton and Muellbauer (1980).

However, it was my recent post on hypothesis testing that prompted Michael to email me, and King makes some telling observations on this topic in his address.

I liked this remark about the need to be explicit about the hypotheses that we have in mind when undertaking empirical work:

King also talks about "The Village Idiot Hypothesis", in relation to the preoccupation with testing hypotheses such as β = 0.

As Michael said to me in his email, "When, as in one example, decades of research have indicated that some elasticity is -0.2, why do new papers test whether β = 0 rather than β = -0.2?"

If you have access to the American Journal of Agricultural Economics, I recommend that you take a look at Richard King's address, as he makes several other important points that practitioners should take to heart.

References

King, R. A., 1979. Choices and consequences. American Journal of Agricultural Economics, 61, 839-848.

Deaton, A. and J. Muellbauer, 1980. An almost ideal demand system. American Economic Review, 70, 312-326.

Stone, R., 1954. Linear expenditure systems and demand analysis: An application to the pattern of British demand".Economic Journal, 64, 511-527.

Theil, H., 1975. Theory and Measurement of Consumer Demand, Vol. 1. North-Holland, Amsterdam.

Update to ARDL Add-In for EViews

In a post back in January, I drew attention to an Add-In for EViews that allows you to estimate ARDL models. The Add-In was written by Yashar Tarverdi. At that time, one limitation was that the Add-In handles only two variables, X and Y.

Judging by the questions and feedback I get about ARDL models, I know you'll be delighted to know that this limitation has been eased considerably. News out of @IHSEViews on Twitter this morning announces that the Add-In will now handle up to ten variables.

Good job! And thanks!

Wednesday, November 5, 2014

Computing Power Curves

In a recent post I discussed some aspects of the distributions of some common test statistics when the null hypothesis that's being tested is actually false. One of the things that we saw there was that in many cases these distributions are "non-central", with a non-centrality parameter that increases as we move further and further away from the null hypothesis being true.

In such cases, it's the value of the non-centrality parameter that determines the power of tests. For a particular sample size and choice of significance level, this parameter usually depends on the all of the other features of the testing problem in question.

To illustrate this in more detail, let's consider a linear multiple regression model:

Central and Non-Central Distributions

Let's imagine that you're teaching an econometrics class that features hypothesis testing. It may be an elementary introduction to the topic itself; or it may be a more detailed discussion of a particular testing problem. We're not talking here about a course on Bayesian econometrics, so in all likelihood you'll be following the "classical" Neyman-Pearson paradigm.

You set up the null and alternative hypotheses. You introduce the idea of a test statistic, and hopefully, you explain why we try to find one that's "pivotal". You talk about Type I and Type II errors; and the trade-off between the probabilities of these errors occurring.

You might talk about the idea of assigning a significance level for the test in advance of implementing it; or you might talk about p-values. In either case, you have to emphasize to the classt that in order to apply the test itself, you have to know the sampling distribution of your test statistic for the situation where the null hypothesis is true.

Why is this?

Confusing Charts

Today's on-line edition of The New Zealand Herald includes an article titled "Junior rugby putting little kids in harm's way". The article included two charts, presented one after the other, and explicitly intended to be viewed as a a pair. Here they are:

Friday, October 31, 2014

Recent Reading

From my "Recently Read" list:

Born, B. and J. Breitung, 2014. Testing for serial correlation in fixed-effects panel data models. Econometric Reviews, in press.
Enders, W. and Lee. J., 2011. A unit root test using a Fourier series to approximate smooth breaks, Oxford Bulletin of Economics and Statistics, 74, 574-599.
Götz, T. B. and A. W. Hecq, 2014. Testing for Granger causality in large mixed-frequency VARs. RM/14/028, Maastricht University, SBE, Department of Quantitative Economics.
Kass, R. E., 2011. Statistical inference: The big picture. Statistical Science, 26, 1-9.
Qian, J. and L. Su, 2014. Structural change estimation in time series regressions with endogenous variables. Economics Letters, in press.
Wickens, M., 2014. How did we get to where we are now? Reflections on 50 years of macroeconomic and financial econometrics. Discussion Paper No. 14/17, Department of Economics and Related Studies, University of York.

Thursday, October 30, 2014

Testing......1, 2, 3, ......

I often think that most courses in econometric theory are somewhat unbalanced. Much more attention is given to estimation principles and estimator properties than is given to the principles of hypothesis testing, the properties of tests.

This always strikes me as somewhat ironic. In econometrics we're at least as interested in testing some interesting economic hypotheses as we are in estimating some particular parameters.

For that reason, even my introductory undergraduate "economic statistics" course always includes some basic material on the properties of tests. By this I mean properties such Uniformly Most Powerful; Locally Most Powerful; Consistent; and Unbiased. (With respect to the last two properties I do mean test properties, not estimator properties.)

After all, when you're first learning about hypothesis testing, it's important to know that there are sound justifications for using the particular tests that are being taught. We don't use the "t-test" simply because it was first proposed by a brewer! Or, for that matter, because tables of critical values are in an appendix of our text book. We use it because, under certain circumstances, it is Uniformly Most Powerful (against one-sided alternative hypotheses).

If tests aren't motivated and justified in this sort of way, we're just dishing out recipes to our students. And I've never liked the cookbook approach to the teaching of statistics or econometrics.

There's a lot to blog about when it comes to hypothesis testing. In some upcoming posts I'll try and cover some testing topics which, in my view, are given too little attention in traditional econometrics courses.

To whet your appetite - the first two will be about the distributions of some standard test statistics when the null hypothesis is false; and how this information can be used to compute some power curves.

Wednesday, October 29, 2014

Econometrics Term Test

A few days ago the students in my introductory graduate Econometrics course had their mid-term test.

Here's the test, and a brief solution.

How did you fare?

Tuesday, October 28, 2014

Would You Like Some Hot Potatoes?

O.K., I know - that was a really cheap way of getting your attention.

However, it worked, and this post really is about Hot Potatoes - not the edible variety, but some teaching apps. from "Half-Baked Software" here at the University of Victoria.

To quote:

"The Hot Potatoes suite includes six applications, enabling you to create interactive multiple-choice, short-answer, jumbled-sentence, crossword, matching/ordering and gap-fill exercises for the World Wide Web. Hot Potatoes is freeware, and you may use it for any purpose or project you like."

I've included some Hot Potatoes multiple choice exercises on the web pages for several of my courses for some years now. Recently, some of the students in my introductory graduate econometrics course mentioned that these exercises were quite helpful. So, I thought I'd share the Hot Potatoes apps. for that course with readers of this blog.

There are eight multiple-choice exercise sets in total, and you can run them from here:

Quiz 1 ; Quiz 2 ; Quiz 3 ; Quiz 4; Quiz 5 ; Quiz 6 ; Quiz 7 ; Quiz 8 .

I've also put the HTML and associated PDF files on the code page for this blog. If you're going to download them and use them on your own computer or website, just make sure that the PDF files are located in the same folder (directory) as the HTML files. I plan to extend and update these Hot Potatoes exercises in the near future, but hopefully some readers will find them useful in the meantime. © 2014, David E. Giles

Friday, October 17, 2014

Econometric Research Resources

The following page, put together by John Kane at the Department of Economics, SUNY-Oswego, has some very useful links for econometrics students and researchers: Econometric Research Resources.

Monday, October 13, 2014

Illustrating Asymptotic Behaviour - Part III

This is the third in a sequence of posts about some basic concepts relating to large-sample asymptotics and the linear regression model. The first two posts (here and here) dealt with items 1 and 2 in the following list, and you'll find it helpful to read them before proceeding with this post:

The consistency of the OLS estimator in a situation where it's known to be biased in small samples.
The correct way to think about the asymptotic distribution of the OLS estimator.
A comparison of the OLS estimator and another estimator, in terms of asymptotic efficiency.

Here, we're going to deal with item 3, again via a small Monte Carlo experiment, using EViews.

From the website of the Royal Swedish Academy of Sciences:

The Prize in Economic Sciences 2014

The Royal Swedish Academy of Sciences has decided to award the Sveriges Riksbanks Prize in Economic Sciences in Memory of Alfred Nobel for 2014 to Jean Tirole, Toulouse 1 Capitole University, France

“for his analysis of market power and regulation”.

Sunday, October 12, 2014

Illustrating Asymptotic Behaviour - Part II

This is the second in a sequence of three posts that deal with large-sample asymptotics - especially in the context of the linear regression model. The first post dealt with item 1 in this list:

The consistency of the OLS estimator in a situation where it's known to be biased in small samples.
The correct way to think about the asymptotic distribution of the OLS estimator.
A comparison of the OLS estimator and another estimator, in terms of asymptotic efficiency.

No surprise, but this post deals with item 2. To get the most out of it, I strongly recommend reading the first post before proceeding.

Illustrating Asymptotic Behaviour - Part I

Learning the basics about the (large sample) asymptotic behaviour of estimators and test statistics is always a challenge. Teaching this material can be challenging too!

So, in this post and in two more to follow, I'm going to talk about a small Monte Carlo experiment that illustrates some aspects of the asymptotic behaviour of the OLS estimator. I'll focus on three things:

The consistency of the OLS estimator in a situation where it's known to be biased in small samples.
The correct way to think about the asymptotic distribution of the OLS estimator.
A comparison of the OLS estimator and another estimator, in terms of asymptotic efficiency.

October Reading

October already!

Chauvel, C. and J. O'Quigley, 2014. Tests for comparing estimated survival functions. Biometrika, 101, 535-552.
Choi, I., 2014. Unit root tests for dependent and heterogeneous micropanels. Discussion Paper No. 2014-04, Research Institute for Market Economy, Sogang University.
Cho, J. S. and H. White, 2014. Testing the equality of two positive-definite matrices with application to in formation matrix testing. Discussion Paper, School of Economics,Yonsei University.
Hansen, B. E., 2013. Model averaging, asymptotic risk, and regressor groups. Quantitative Economics, in press.
Miller, J. I., 2014. Simple robust tests for the specification of high-frequency predictors of a low-frequency series. Mimeo., Department of Economics, University of Missouri.
Owen, A. B. and P. A. Roediger, 2014. The sign of the logistic regression coefficient. American Statistician, in press.
Westfall, P. H., 2014. Kurtosis as peakedness, 1905-2014. R.I.P.. American Statistician, 68, 191-195.

Saturday, September 20, 2014

The (Non-) Standard Asymptotics of Dickey-Fuller Tests

One of the most widely used tests in econometrics is the (augmented) Dickey-Fuller (DF) test. We use it in the context of time series data to test the null hypothesis that a series has a unit root (i.e., it is I(1)), against the alternative hypothesis that the series is I(0), and hence stationary. If we apply the test to a first-differenced time series, then the null is that the series is I(2), and the alternative hypothesis is that it is I(1), and so on.

Suppose that the time series in question is {Y_t; t = 1, 2, 3, ......, T}. The so-called "Dickey-Fuller regression" is a least squares regression of the form:

ΔY_t = [α + β t] + γY_t-1 + [Σ δ_j ΔY_t-j] + ε_t . (1)

Here, terms in square brackets are optional; and of these the "p" ΔY_t-j terms are the "augmentation terms", whose role is to ensure that the there is no autocorrelation in the equation's residuals.

Standard econometrics packages allow for three versions of (1):

No drift - no trend: that is, the (α + β t) terms are omitted.
Drift - no trend: the intercept (drift term) is included, but the linear trend term is not.
Drift - and - trend: both of the α and (β t) terms are included.

For example, here's the dialogue box that you see when you go to apply the DF test using the EViews package:

Least Squares, Perfect Multicollinearity, & Estimable Functions

This post is essentially an extension of another recent post on this blog. I'll assume that you've read that post, where I discussed the problem of solving linear equations of the form Ax = y, when the matrix A is singular.

Let's look at how this problem might arise in the context of estimating the coefficients of a linear regression model, y = Xβ + ε. In the previous post, I said:

"Least squares estimation leads to the so-called "normal equations":

X'Xb = X'y . (1)

If the regressor matrix, X, has k columns, then (1) is a set of k linear equations in the k unknown elements of β. You'll recall that if X has full column rank, k, then (X'X) also has full rank, k, and so (X'X)^-1is well-defined. We then pre-multiply each side of (1) by (X'X)^-1, yielding the familiar least squares estimator for β, namely b = (X'X)^-1X'y.

So, as long as we don't have "perfect multicollinearity" among the regressors (the columns of X), we can solve (1), and the least squares estimator is defined. More specifically, a unique estimator for each individual element of β is defined.

What if there is perfect multicollinearity, so that the rank of X, and of (X'X), is less than k? In that case, we can't compute (X'X)^-1, we can't solve the normal equations in the usual way, and we can't get a unique estimator for the (full) β vector."

I promised that I'd come back to the statement, "we can't get a unique estimator for the (full) β vector". Now's the time to do that.

"Inverting" Singular Matrices

You can only invert a matrix if that matrix is non-singular. Right? Actually, that's wrong.

You see, there are various sorts of inverse matrices, and most of them apply to the situation where the original matrix is singular.

Before elaborating on this, notice that this fact may be interesting in the context of estimating the coefficients of a linear regression model, y = Xβ + ε. Least squares estimation leads to the so-called "normal equations":

X'Xb = X'y . (1)

If the regressor matrix, X, has k columns, then (1) is a set of k linear equations in the k unknown elements of β. You'll recall that if X has full column rank, k, then (X'X) also has full rank, k, and so (X'X)^-1is well-defined. We then pre-multiply each side of (1) by (X'X)^-1, yielding the familiar least squares estimator for β, namely b = (X'X)^-1X'y.

So, as long as we don't have "perfect multicollinearity" among the regressors (the columns of X), we can solve (1), and the least squares estimator is defined. More specifically, a unique estimator for each individual element of β is defined.

What if there is perfect multicollinearity, so that the rank of X, and of (X'X), is less than k? In that case, we can't compute (X'X)^-1, we can't solve the normal equations in the usual way, and we can't get a unique estimator for the (full) β vector.

Let's look carefully at the last sentence above. There are two parts of it that bear closer scrutiny:

The Econometrics of Temporal Aggregation - IV - Cointegration

My previous post on aggregating time series data over time dealt with some of the consequences for unit roots. The next logical thing to consider is the effect of such aggregation on cointegration, and on testing for its presence.

As in the earlier discussion, we'll consider the situation where the aggregation is over "m" high-frequency periods. A lower case symbol will represent a high-frequency observation on a variable of interest; and an upper-case symbol will denote the aggregated series. So,

Y_t = y_t + y_{t - 1} + ......+ y_{t - m + 1} .

If we're aggregating quarterly (flow) data to annual data, then m = 4. In the case of aggregation from monthly to quarterly data, m = 3, and so on.

We know, from my earlier post, that if y_t is integrated of order one (i.e., I(1)), then so is Y_t.

Suppose that we also have a second temporally aggregated series:

X_t = x_t + x_{t - 1} + ......+ x_{t - m + 1} .

Again, if x_t is I(1) then X_t is also I(1). There is the possibility that x_t and y_t are cointegrated. If they are, is the same true for the aggregated series, X_t and Y_t?

Unit Root Tests and Seasonally Adjusted Data

We all know why it's common to "seasonally adjust" economic time series data that are recorded on a monthly, quarterly, etc. basis. Students are sometimes surprised to learn that in some countries certain such time series are reported only in seasonally adjusted terms. You can't get the original (unadjusted data). This applies to some U.S. economic data, for example.

Does this matter?

Pages

Wednesday, December 31, 2014

Monday, December 29, 2014

Sunday, December 28, 2014

Saturday, December 27, 2014

Wednesday, December 17, 2014

Sunday, December 14, 2014

Saturday, December 13, 2014

Friday, December 12, 2014

Thursday, December 11, 2014

Monday, December 8, 2014

Sunday, December 7, 2014

Saturday, December 6, 2014

Thursday, December 4, 2014

Monday, December 1, 2014

Friday, November 28, 2014

Tuesday, November 25, 2014

Wednesday, November 19, 2014

Sunday, November 16, 2014

Friday, November 14, 2014

Tuesday, November 11, 2014

Monday, November 10, 2014

Sunday, November 9, 2014

Saturday, November 8, 2014

Friday, November 7, 2014

Thursday, November 6, 2014

Wednesday, November 5, 2014

Monday, November 3, 2014

Sunday, November 2, 2014

Friday, October 31, 2014

Thursday, October 30, 2014

Wednesday, October 29, 2014

Tuesday, October 28, 2014

Friday, October 17, 2014

Monday, October 13, 2014

From the website of the Royal Swedish Academy of Sciences:

The Prize in Economic Sciences 2014

Sunday, October 12, 2014

Saturday, October 11, 2014

Wednesday, October 1, 2014

Saturday, September 20, 2014

Friday, September 19, 2014

Thursday, September 18, 2014

Saturday, September 13, 2014

Friday, September 12, 2014