Friday, April 15, 2011

Price Indices From Regression Models

We're all familiar with index numbers. We encounter them every day - perhaps in the form of the CPI or the PPI;  or maybe in the form of some index of production. The effective exchange rate is an index that I'll be posting about in the near future. Share price indices are also familiar, although the DJIA has some very peculiar aspects to its construction that deserve separate consideration on another occasion.

The thing about an index number is that it has only ordinal content. That's to say, if  a particular price index, say P, has a (unit-less) value of 110, that number tells us nothing about the price level at all. It's only when we compare two values of the same index - say, the values in 2010 and in 2011 - that the numbers really mean anything. If P = 100 in 2010 and P = 105 in 2011, then the average price of the bundle of goods being measured by P has changed (risen in this case) by 5% - not by $5 or some other value. In other words, over time, or perhaps across regions, an index number measures proportional changes.

When any index number is constructed, a base period and a base value must first be chosen. For example, we might decide to choose a base year of 1996, and a base value of 100. There's absolutely nothing wrong with choosing a base value of, say, 167.5290 in 1996 - it would just be unnecessarily inconvenient. In that case if the index rose to 184.2819 in 1997, this would imply a relative price change of 100*[(184.2819 - 167.5290) / 167.5290] = 10%. Wouldn't it have been easier if we had chosen the base value to be 100, observed a value of 110 in 1997, and then been able to see immediately that this implied a 10% increase in prices over this one-year period?

Of course, it's the fact that an index measures only relative changes over time that enables us to "re-base" (change the base year) an index without losing any information at all. The numbers in the index series just get scaled, multiplicatively, by the same factor, leaving relative values - and the implications for price changes -  unaltered.

Now, I'm sure that this is all familiar stuff, and I'm equally sure that you know that many index number series are constructed by our various statistical agencies using a handful of standard formulae. I'm going to use price indices as a concrete example, but if you interchange prices and quantities everywhere in what follows, the same story applies to quantity indices. I'm going to abstract from some important details - taking them into account would not change my message in any way at all, but it would undoubtedly make it a little less transparent. The sort of details that I have in mind include:
  • Choosing a base value so that it averages 100 (say) over a calendar year when the index is constructed for monthly or quarterly data.
  • "Chain-linking" the index values over time.
  • Aggregating sub-indices for expenditure groups or different regions into an aggregate index.
Let's just focus on some really basic stuff - in particular the well-known index formulae that we usually attribute to Laspeyres and to Paasche. Both are widely used in the construction of "official" price and quantity indices around the world. Each method has its strengths and weaknesses, and you probably know that if we take the geometric average of a Laspeyes index value and a Paasche index value, we get the corresponding value for Fisher's "ideal" index. (A geometric mean, rather than an arithmetic mean is appropriate here because we are averaging ratios, not levels.) So, what follows also has relevance for Fisher's ideal index, which is now also used a lot by national statistical agencies.

Now here's the thing. Suppose that I read in the newspaper that the Canadian "core" CPI (say) has risen from 120 to 122.4 over the last year, implying an annual inflation rate of 2%, my first reaction is likely to be that our central bank (the Bank of Canada) is going to be very happy in terms of their inflation-targeting monetary policy. Any further reaction depends a bit on how you view those two numbers - 120 and 122.4. Are they deterministic values, or are they perhaps just "realized values" of an underlying random process? After all, there's lots of potential for measurement error in the surveys that are used to obtain the individual-level or group-level price and expenditure data that are used to construct the overall index.

Had you thought about those CPI numbers in this way before? Maybe not! Personally I'm always skeptical about figures that I read in the newspaper, so why should we presume that there is something sacred about these particular numbers?

There's a literature out there that takes the stance that such index numbers are realized values of a random variable. Among the contributions are those by Allen (1975), Banerjee (1975), Clements & Izan (1987), and Selvanathan (1989). This literature includes some interesting results that show us how we can use some simple econometrics to generate an index number series, and how we can then test some quite important hypotheses - such as, is that rise in the index from 120 to 122.4 statistically significant?

Let's look at the Laspeyres and Paasche price indices, each based on a bundle of n goods, and each with a base value of '1' in period '0"'. (Remember, the base period and base value are totally arbitrary, and don't affect the interpretation of the index values.) I'll use pit and qit to denote the price and quantity, respectively for the ith good in period 't'. Then, the Laspeyres price index in period t is defined as:

                             P0tL = [ ∑ (pit qi0) /  ∑ (pi0 qi0) ] ;  t = 0, 1, 2, ........,T               (1)

and the corresponding Paasche price index is:

                              P0tP = [ ∑ (pit qit) /  ∑ (pi0 qit) ] ;  t = 0, 1, 2, ........T                (2)

In all cases here, and it what follows, the summations are taken from i = 1 to n. The zero subscript on the price indices names reminds us that the base period is period '0'.

It's time to use your Econometrician's eyes and take a really close look at equations (1) and (2). What does the algebraic structure of these formulae remind you of? Well, here's one interpretation of each index number formula. First, consider the following simple regression model:

                                  pit = βt pi0 + εit ;    i = 1, 2, ......., n t = 0, 1, ........, T        (3)

For a fixed time-period (t), let's estimate the slope coefficient using data for the n goods, by Instrumental Variables estimation, with  qi0 as the instrument. The formula for this IV estimator of βt is precisely P0tL .  Alternatively, if we estimate (3) by I.V., with  qit as the instrument, the estimator for βt is now P0tP .  There are several things to note about this:
  1. We will need data on individual (or group) prices and quantities, and usually what are available are price data and expenditure data. The latter can be divided by the prices to give us the quantity data we need, good by good, and period by period.
  2. We can repeat this process period by period, in each case estimating the index value for that period using the n observations for the individual goods.
  3. I.V. estimation is a natural choice here. If price is the dependent variable in (3), then prices are being viewed as being random, including at time t = 0, so we have a random regressor that is likely to be correlated with the error term. A long as quantities are not correlated with the errors, I.V. estimation will yield consistent estimates of the parameters, and hence of the price index values.
  4. We need to think about the error-term assumptions and the relevance of the instrument(s) if we're going to move from the algebraic mechanics of this, to issues of statistical inference.
  5. If you wish, rather than giving the index formulae an I.V. interpretation, you can set up slightly different models from (3), assume a particular form of heteroskedasticity for the error term, and then apply WLS to get the Laspeyes and Paasche formulae. I won't go into that here, but this was suggested by Selvanathan (1991), and extended in the following way by Giles & McCann (1994):
  6. We can take a system of equations of the form (3), with one equation for each year. Then estimating the system as a whole allows us to subsequently test cross-equation (across-time) restrictions on the parameters ( i.e., on the underlying price index). We'll take a look at this idea below.
Now, just to show you some of the things that you can do by following this line of reasoning, I have a small example for you. The data are artificial, and are for the years 2000 to 2011 (T = 11, starting with t = 0). There are twenty expenditure groups (n = 20). These data are in a Excel workbook on the Data page that goes with this blog, and there is an EViews workfile on the Code page for the blog that you can use to play around with my results. Here are the results of my I.V. estimation, with White's heteroskedasticity-consistent standard errors in parentheses:

                                  Laspeyres                     Paasche

2000                      1.0000    (    -    )           1.0000    (    -   )      
2001                      1.0425    (0.056)            1.0591    (0.064)
2002                      1.0026    (0.069)            0.9997    (0.079)
2003                      0.9779    (0.053)            0.9811    (0.053)
2004                      1.0448    (0.052)            1.0434    (0.053)
2005                      1.1241    (0.038)            1.1360    (0.034)
2006                      1.1708    (0.069)            1.1919    (0.057)
2007                      1.2162    (0.075)            1.2230    (0.079)
2008                      1.3758    (0.072)            1.3810    (0.065)
2009                      1.2888    (0.090)            1.2935    (0.101)
2010                      1.4535    (0.187)            1.4536    (0.172)
2011                      1.4199    (0.110)            1.4534    (0.110)

Each value for the price index is just an estimated regression coefficient, so we also get a standard error, reflecting the uncertainty associated with the point estimate of the index value. Consider the Laspeyres price index values for 2001 and 2002. They suggest that prices fell by 3.83% over that year. Now we can ask, was there a significant drop in prices? Using the standard error (0.056) for the 2001 Laspeyes point estimate of the price index, and forming an approximate 95% confidence interval, you can check that this interval easily covers the 2002 point estimate of 1.0026. This suggests that the apparent price drop is not significant, at the 5% level.

Of course, we have to be cautious here about constructing such confidence intervals, because we are using I.V. estimation. Given a large enough sample, we could appeal to the result that I.V. estimators are asymptotically normally distributed, in which case my rough confidence interval would be O.K. However, we have only n = 20 observations in each sample here, so we have to be a bit careful. Applying the Jarque-Bera test I find that I can't reject the hypothesis that the errors are normal in each of the 22 regressions I've fitted - but the J-B test is also valid only asymptotically. In addition, even if the errors are normally distributed, there is no result that guarantees that the I.V. estimator has a sampling distribution that is normal in finite samples.

This is an obvious situation where the Bootstrap could be used to simulate the exact sampling distributions of the I.V. estimator in each of its 22 applications here, and to construct appropriate confidence intervals. There are other things that can be done as well, now that we have the basic econometric framework. For example, Giles and McCann (1994) discuss how we can estimate systems of equations of the form (3), and then use the results to test more formally for zero price changes across years, using a Wald test. They also discuss using such systems to test the hypothesis that removing one an item from the "bundle" of n goods has no effect on the value of the price index.

 I've  set up a system for you to play with in the accompanying EViews file, in case you're interested. You'll find that if you estimate the system and then apply a Wald test of the hypothesis that there is no price change between 2001 and 2002, then this hypothesis can't be rejected ( consistent with the conclusion above).

So, the main message to take from this particular post is that by treating a Price Index as being stochastic, we can use some fairly standard econometric analysis to test some interesting hypotheses about price inflation. I've found this analysis to be useful in my teaching as it provides a nice connection between an interesting topic in descriptive statistics, and the application of inferential econometric methods. 

Note: The links to the following references will be helpful only if your computer's IP address gives you access to the electronic versions of the publications in question. That's why a written References section is provided.

Allen, R. G. D. (1975). Index Numbers in Theory and Practice. MacMillan, New York.

Banerjee, K. S. (1975). Cost of Living Index Numbers - Practice, Decision and Theory. Marcel Dekker, New York.

Clements, K. W. and H. Y. Izan (1987). The measurement of inflation: A stochastic approach. Journal of Business and Economic Statistics, 5, 339-350.

Giles, D. and E. McCann (1994). Price indices: Systems estimation and tests. Journal of Quantitative Economics, 10, 219-225.

Selvanathan, E. A. (1989). A note on the stochastic approach to index numbers. Journal of Business and Economic Statistics, 7, 471-474.

Selvanathan, E. A. (1991). Standard errors for Laspeyres and Paasche index numbers. Economics Letters, 35, 35-58. 

© 2011, David E. Giles


  1. Caveat: I learned this stuff in an undergrad course many years ago that I mostly slept through.

    “Of course, it’s the fact that an index measures only relative changes over time that enables us to ‘re-base’ (change the base year) an index without losing any information at all.”

    This is correct, but I don’t think it fully captures now a Paasche index is used in the context of GDP deflators, the price indices associated with GDP figures. At least before the introduction of chain-weighted indices, real GDP is not by calculating nominal GDP and deflating by a separately developed price index.

    Instead, nominal GDP and real GDP are measured directly. Suppose you choose 2005 as the base year. You then calculate nominal GDP in 2010 and then compute real GDP by multiplying 2010 quantities with 2005 prices. The implicit GDP deflator is nominal GDP divided by real GDP. For this reason, rebasing real GDP will usually lead to restatements of past GDP growth when goods and services change dramatically.

    Of course, once you have computed real GDP for every year since 1933 or whenever you started doing this, you will have a complete price index series, and you can re-base (in your sense) this series to make the value for any year equal to 100. But this is different than the re-basing (maybe that’s not the right term?) used for GDP deflators.

    I know that the US Census Bureau now focuses on chain-weighted price indices, but many countries still use the traditional method.

  2. Guan - thanks for the very pertinent comment. You're absolutely right, of course. Implicit price deflators, such as the implicit GDP deflator, raise issues beyond those associated with simple Paasche and Laspeyres indices. You definitely didn't sleep through that class!