Thursday, September 20, 2018

Controlling My Heating Bill Using Bayesian Model Averaging

Where we live, in rural Ontario, we're not connected to "natural gas". Our home furnace runs on propane, and a local supplier sends a tanker to refill our propane tanks on a regular basis during the colder months.

Earlier this month we had to make a decision regarding our contract with the propane retailer. Should we opt for a delivery price that can vary, up or down, throughout the coming fall and winter; or should we "lock in" at a fixed delivery price for the period from October to May of next year?

Now, I must confess that my knowledge of the propane industry is slight, to say the least. I decided that a basic analysis of the historical propane price data might provide some insights to assist in making this decision. It also occurred to me, after doing this, that the analysis that I went through might be of interest to readers, as a simple exercise in forecasting using Bayesian model averaging.

Here are the details...........

To begin with, keep in mind that all I want to get out of this is a handle on the likely general direction of propane prices over the coming months. Historical data for my supplier's retail price aren't available, of course, but the main "driver" will be the spot price in the wholesale market.

So, my analysis was based on monthly data for the spot price of propane (in Canadian dollars) at Mont Belvieu, TX, for the period July 1998 to July 2018.  Texas prices are more relevant than Edmonton prices in my part of Canada. These data are available in a text file on the data page for this blog. (That file includes a brief description of the data and their source.)

I didn't try to construct a structural model of the supply-price of propane using economic and other variables. As I've noted already, I'm really not very familiar with his market. Instead, I opted for a simple time-series analysis based on ARIMA modelling. However, I considered a wide range of such models - actually, 225 of them - and combined the forecasts from each using Bayesian model averaging (BMA).

I used the EViews 10 econometric package for this exercise, and you'll find my EViews workfile on the code page for this blog. (That workfile includes a "text object" titled "data_info" that contains a description and source of the data that I used.)

Here's what the price data look like:
Applying the "Automatic ARIMA Forecasting" procedure to this price series in EViews. I let the automatic algorithm choose between using the level of the Price series or its logarithm. The latter was selected. I also let the algorithm choose the order of differencing of the (log) data, ranging from zero to second-differencing, using the KPSS test for stationarity. First-differencing was chosen. 

The series that was then modelled is depicted below. (Of course, later on the forecasting part of my analysis is based on the back-transformed data - namely the original Price series.)
Then I used the automated procedure in EVIews to estimate all 225 ARIMA models satisfying the following specifications: p = 0, 1, 2, 3, 4; q = 0, 1, 2, 3, 4; ps = 0, 1, 2; qs = 0, 1, 2. Here, "p" and "q" denote the AR and MA orders, while "ps" and "qs" are their seasonal counterparts.

Further details can be found in the text object named "model_spec_info" in the EViews workfile.

Of the 225 competing models, the preferred one (on the basis of the Schwarz Information Criterion, SC) was a simple ARMA(0,1) model for the log-differenced series. This, and the other highly-ranked alternative models, are revealed in the following plot:

Note that the horizontal axis in this plot shows model specifications in the format (p,q)(ps,qs); and that the smaller (more negative) is the SC value, the more a model is preferred. My reasons for using SC in this context, rather than (say) Akaike's Infomration Criterion are discussed in this earlier post.

Using the preferred model, I generated forecasts of the original propane price for the period August 2018 to May 2019, with the following results:

Remember, I'm really not too concerned with the details or actual values of the forecasts. I just want to know "should I lock in at a fixed price over this forecast horizon". The fixed price I was offered is moderately attractive, and the model suggests that market prices may rise. So, locking in may be a sensible strategy.

But this is just the result of using one model, and the results may be sensitive to the choice of this model. It may be the "most preferred" one among those that I've considered, but don;t all of the models have something say? After all, there's a huge literature that supports the value of "model averaging" when it come to time-series forecasting.

With this mind I then chose the "Forecast Averaging" option in the "Automatic ARIMA Forecasting" procedure in EViews, and combined the forecasts from all 225 models into a single forecast (series). Each individual forecast series was weighted according to the relative SC (also called the Bayesian Information Criterion, or BIC) values. More specifically, the weight given to the ith of the N = 225 models is computed as
That is, Bayesian Model Averaging (BMA) of the forecasts was used, as these weights approximate the Bayes factors for each model if the sample size is large.

The resulting weighted average forecast series (in red), together with the 225 individual forecast series, can be seen in the following chart:

The average forecast series is basically flat over the period of interest. Moreover, the majority of the individual models are in accordance with this result. Although the result in this case isn't quite as striking as that for the single "preferred" model given above, it's enough to persuade me to "lock in".

Some major caveats to all of this are in order, including:
  • My forecasts are only for the wholesale spot price of propane in Texas, and they ignores the whims of my local retail supplier when it comes to price mark-ups.
  • The price data I used are measured in Canadian dollars, which is appropriate to my situation. Exchange rate movements will impact on the local retail price of propane, but I'm not brave enough to try and forecast those!  
  • The 2018 hurricane season is upon us already. Exogenous shocks to Texas oil and gas prices due to extreme weather could be significant, but who knows? 
  • And finally, there's the possible TNT (Trump's effect on NAFTA Trade) explosion. What a wild card that could be!
I mentioned this little statistical exercise to one of my neighbours, in passing. When I told him that I'd estimated 225 models his response was that people around here usually limit themselves to no more than five. You can't beat an excellent neighbour with a great sense of humour! 

He went on to comment that he'd checked the Farmers' Almanac while he was down at the local drugstore, and had also concluded that locking in was the way to go. I wish I'd thought of that!

© 2018, David E. Giles


  1. Thanks,
    Really neat. I have used the automatic function in the past, but not the averaging function.

  2. Wouldn't it be a good idea to also examine forecasting performance in - sample, using a hold - out sample?