Monday, October 21, 2013

A "Segmented" Regression Problem

Here's a little exercise for the students among you.

Suppose that we want to fit a least squares regression model that allows for a "break" in the underlying relationship at a particular sample value for the regressor(s). In addition, we want to make sure that the fitted model passes through that sample value.

In other words, we want to end up with a fitted model that gives a result such as this:

Here, the two segments of the regression line "join" when X=30. What's a simple way to achieve this?

© 2013, David E. Giles


  1. One method might be to use a partitioning method that divides the sequence into each possible set of two segments. For each selection of a point that bisects the set, the least squares regression is calculated. The point with the least amount of error wins.

    Counter question: How do you know how many partitions is optimal?

  2. Here's my attempt.

    Let model a be Y_a = B_oa + B_1a * X ( B_oa is intercept and B_1a is regression coeff )
    Let model b be Y_b = B_ob + B_1b * X ( B_ob is intercept and B_1b is regression coeff )

    estimate model a) over x = 1:30 subject to Y_a(30) hat = 100 ( approximately )
    estimate model b) over x = 30:100 subject to Y_b(30) hat = 100 ( approximately ))

    Not necessarily simple but it should work.

    estimate model 1) over x = 30:100 subject to Yhat(30) = 100 ( approximately )

  3. Regress y on x and a generated regressor x2 that is 0 when x<=30 and equal to (x-30) when x>30?

  4. Fit the model:

    y = β1*(X-30) + D*β2*(X-30), without including a constant, and where D = 1 if X > 30. ????? R seems to back me up!

  5. hi dimitry: wouldn't that give you a surface rather than 2 lines ? I may not be understanding. thanks for

    also I left an errant line at the bottom of my attempt. my attempt should have ended at "it should work".

  6. I think we should add intercept and slope dummy D which will be 1 when x>30 so the model becomes

    Y = a + bX + cD*X + dD + et

    the slope dummy will tell the marginal change in the slope

    1. A slope dummy will place a jump-discontinuity into the model at X=30.

    2. Err. Just using a slope dummy (without re-centering the 'new' X variable), that is. An intercept dummy will also add a discontinuity into the model.

  7. Mark: yhat is a function of x only, so I don't think it qualifies as a surface.