Econometrics Beat: Dave Giles' Blog: Solution to the Segmented Regression Problem

Tuesday, October 22, 2013

Solution to the Segmented Regression Problem

Here's my solution to the "segmented regression" problem that I posed yesterday. Thanks for the comments and suggestions!

You'll recall that what we wanted to do was to end up with a fitted least squares "line" looking like this:

In particular, the "kink" in the line is at a pre-determined point - in this example when x = 30.

Here's how we can achieve this:

The basic regression model is

y_i = α + β x_i + ε_i ; i = , 2, ...., n . (1)

Suppose that we want the line segments to join when x = x*. Then, define a dummy variable, Di, such that:

D_i = 0 ; if xi ≤ x*

D_i = 1 ; if xi > x*

The two line segments in the graph above have different intercepts and different slopes, so would probably think of modifying model (1) to become:

y_i = α + β x_i + γ D_i + δ (x_iD_i) + ε_i ; i = , 2, ...., n . (2)

That's a good start, but we still have force the join-point to be at x*.

This requirement amounts to the following restriction on the parameters of the model:

γ + δ x* = 0,

where x* is just a known number (30 in my example above).

Using this restriction to eliminate γ from equation (2), we get:

y_i = α + β x_i + δ D_i(x_i - x*) + ε_i ; i = 1, 2,..., n . (3)

Here is the EViews output for my estimated regression model:

The EViews workfile is on the code page for this blog, and the data I used are available on the data page.

We can then generate within-sample forecasts, separately, for observations 1 to 30, and observations 30 to 100. If these series are called YFORC1 and YFORC2, this is (part of) what we get:

Notice that YORC1 = YFORC2 at observation 30, as required.

If we then gather X, Y, YFORC1 and YFORC2 into a group, and produce a scatter-plot, here's the result we wanted:

So, it all comes down to the use of a dummy variable and a restriction of the regression coefficients. One without the other won't work.

Ryan commented on the post in question, and suggested that (in my notation) we estimate the model:

y_i = α (x_i - 30) + β D_i (x_i-30) + ε_i .

This produces the following results:

Ryan gets the join-point alright, but the fit over the first sub-sample doesn't look very convincing. Sorry, buddy!

10 comments:

mark leedsOctober 22, 2013 at 4:35 PM
thanks dave. very interesting. you allow for a changing slope at a known point using the introduction of a dummy variable and a second coefficient. ( dimitry: my mistake. it seems like you were close ) neat stuff. and it also seems like an approach that could be extended to multiple change points ( as long as you know what they are beforehand ) also.

ReplyDelete
Replies
AnonymousOctober 23, 2013 at 11:28 AM
This paper may be of some interest.
http://people.bu.edu/perron/papers/EJ-06.pdf
ReplyDelete
Replies
UnknownOctober 23, 2013 at 7:47 PM
I maintain that unless you are looking for a regression that doesn't give you garbage results, my method is the clear winner buddy.

Thanks for looking at my attempt. I checked the EViews code and played around, very educational. A nice way of exploring the algebra of forcing the regression surface through a fixed a point, with some dummy variables intuition in there. I think this would be a great problem in most econometrics texts. Have your students already been subjected to this? I wonder about bias, and am curious if you have a DGP in mind for this problem.
ReplyDelete
Replies
DavidOctober 24, 2013 at 3:38 AM
Thanks for the awesome blog, Dave!
If x* is unknown, one can find its least squares estimate by minimizing SSR over a set of candidate thresholds. I believe, in the current example such an estimate, i.e. argmin-SSR(x*), happens to be equal to 24. I hope I got this right.
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Pages

Tuesday, October 22, 2013

Solution to the Segmented Regression Problem

10 comments: