Econometrics Beat: Dave Giles' Blog: Estimating & Simulating an SEM

Saturday, May 19, 2012

Estimating & Simulating an SEM

We all know that structural simultaneous equations models (SEM’s) played a key role in the historical development of Econometrics as a discipline. An understanding of these models and the associated estimators is an important part of our training, whether we use these models or not in our day-to-day work. The issues that they raise have helped shape much of our current econometric tool-kit.

I've posted on this topic before, but here I'm going to look at the results of applying various SEM estimators using the EViews econometrics package. In particular, I'll use a simple well-known structural model to illustrate the estimates that are obtained when different “limited information” and “full information” estimators are used.

Then, I'll take a look at using an estimated SEM for the purposes of simulating the effect of a policy shock.

The idea of constructing SEM’s for the macro. economy came from Jan Tinbergen, who estimated a 24-equation system for the Dutch economy in 1936, and others. (See Tinbergen (1959, pp.37-84) for an English translation.) When the first Nobel Prize in Economic Science was awarded in 1969, Tinbergen shared the inaugural honour with Ragnar Frisch (a Norwegian econometrician) for their pioneering work that led to the development of econometrics as a recognized sub-discipline.

Lawrence Klein was (is) also a pioneer and long-term super-star in macroeconometric modelling, for which he also won a Nobel Prize - in 1980. His work influenced econometric modelling around the world, culminating with the ambitious Project LINK.

Klein’s (1950) “Model I” for the U.S. economy was a 6-equation SEM, comprising 3 structural equations and 3 identities. The equations of Klein’s model are given below, with the endogenous variables as the dependent variables in each case:

Consumption:

C_t = α₀ + α₁P_t + α₂P_t-1 + α₃(W^p_t + W^g_t) + ε_1t

Investment:
I_t = β₀ + β₁P_t + β₂P_t-1 + β₃K_t-1 + ε_2t

Private Wages:
W^p_t = γ₀ + γ₁X_t + γ₂X_t-1 + γ₃A_t + ε_3t

Equilibrium Demand:
X_t ≡ C_t + T_t + G_t

Private Profits:
P_t ≡ X_t - T_t - W^p_t

Capital Stock:
K_t ≡ K_t-1 + I_t

The predetermined variables in the model are the intercept, G_t (government non-wage spending), T_t (indirect business taxes plus net exports), W^g_t (government wage bill), A_t (time trend, measured as years from 1931), and the lagged endogenous variables, P_t-1, X_t-1 and K_t-1. Allowing for lags, the net sample period for the estimation of the model was 1921 to 1941 inclusive.

The data for Klein's model are on the data page for this blog, and the EViews workfile that we'll be using is on the code page. Note that the variable called “K1” is just K_t-1, and “W_t” is (W^p_t + W^g_t).

When we estimate each of the 3 structural equations in the model by OLS, this is what we get:

Of course, given the simultaneous nature of the model, and the fact that various current-period endogenous variables appear as regressors in other equations, we know that the OLS estimator is inconsistent in this context.

So, to let's use a simple consistent estimator - Two Stage Least Squares (2SLS). This is just an instrumental variables (I.V.) estimation with all of the predetermined variables in the whole model used as the instruments. It's a "single equation" estimator, in the sense that it is applied one equation at a time.

The results that we obtain are as follows:

If you compare the OLS and 2SLS estimates of the parameters, you'll see that in some cases there are some sizeable numerical differences. Also, in almost all cases the OLS standard errors are less than their 2SLS counterparts. By using OLS, you come away with a false sense of the precision of the estimated structural coefficients.

Next, I'm going to estimate the model by Three Stage Least Squares (3SLS) – this is a “full information” or “system” estimator that has the same asymptotic efficiency as Full Information Maximum Likelihood (FIML). The advantage of this estimator over 2SLS is that not only is it consistent, but in general it will be more efficient (asymptotically) than 2SLS, as it takes into account the presence of the other equations in the model. This is done by recognizing that there will be a (contemporaneous) covariance structure between the error terms in each of the structural equations. The 2SLS estimator ignores this extra information.

We have a pretty small sample here, so let's not get too excited about results that have only asymptotic validity! Moreover, there can be a down-side to using a "system" estimator such as 3SLS. If any one of the equations in the model is mis-specified, this will render the estimates of all of the coefficients in all of the equations inconsistent. So, we should be careful in our choice of estimator.

To implement 3SLS, we first need to create the system we're going to use. In the EViews workfile, we select “Object”, “New Object”, “System”. I've named the system THREESTAGE. We lay out the specification of the structural equations in the model as follows:

(To make things easy for you, if you're going to reproduce these results, the code for these equations is stored in the text-object called “Three_Stage_Spec” in the EViews workfile.)

Then we select the “Estimate” tab and choose “Three-Stage Least Squares” as the estimation method:

Pressing “OK” gives us the 3SLS estimates:

You can compare the 3SLS estimates of the parameters (and their standard errors) with their 2SLS counterparts. I'm not going to dwell on these differences here, though.

Now let’s move ahead to FIML estimation of the model. In this case the 3 identities have to be “solved out” (substituted out) from the model in order for EViews to proceed. If you don't do this, then the endogenous variables won't be distinguished properly from the predetermined variables in the likelihood function, and you'll get the wrong estimates.

With a larger SEM, this set of substitutions would be very tedious – some other econometrics packages allow you to include identities explicitly as part of the model's specification, and the substituting out of the identities is done automatically for you. It seems that this isn't the case in EViews, unfortunately.

So, in the EViews workfile, I selected “Object”, “New Object”, “System”. I named the system FIML. Then I layed out the specification of the 3 (modified) structural equations in the system as follows:

(Again, to make things easy for you if you're planning on replicating this, these equations are stored in the text-object called “FIML_Spec” in the EViews workfile.)

Now we select the “Estimate” tab, choose “Full Information Maximum Likelihood” as the estimation method and then select the “Options” tab. I've altered the default settings as below (including setting 1,000 as the maximum number of iterations for the maximization algorithm):

and we obtain the following estimation results and “Gradients Summary”:

If you look at Greene (2012, p.333), or Greene (2208, p.385), you'll see a summary of the OLS, 2SLS, 3SLS and FIML results, together with some other estimates. The results there agree very closely with ours.

The estimates of the structural form parameters that we've now obtained are interesting in their own right, of course. However, we might also want to use our estimated SEM for the purposes of forecasting the endogenous variables, or for seeing how the predicted "time-path" of these variables is affected if one of the exogenous variables in the model is "shocked", to mimic a policy change of some sort.

To facilitate this, the next thing we have to do is to see how we can “solve” the estimated structural form of the SEM for the estimated restricted reduced form. In our case, the system is linear in both the endogenous variables and the parameters, so this can be achieved by straightforward matrix manipulations.

However, if the SEM were non-linear in the endogenous variables, this solution would have to be achieved iteratively as we would then have a system of non-linear equations to be solved. In that case, techniques such as the Gauss-Seidel method or Newton’s method would be used.

Note that this “solution” process has nothing to do with estimation – that's been done already. What we're now doing is converting the (estimated) structural form equations into the corresponding restricted reduced form equations so that we can either generate forecasts, or else perform policy simulations.

In EViews, there is a distinction between a "System" and a "Model". They are different types of Objects. What we've estimated is a System. We now have values (estimates) for the coefficients. There are no "unknowns". We now need to take this set of equations and store it in a form that can be manipulated. This is termed a Model.

So, first, I select “Object”, “New Object”, “Model”, and I'm going to name the new model FIML_CONTROL. There are various ways to get the estimated equations from the System into this Model. I think that the easiest way at this stage is to copy and paste our FIML System into the blank window for the FIML_CONTROL Model. We then see this:

When we click on the blue “S” logo, we see:

We can then scroll through the endogenous variables to see the specifications of the other two (structural) equations in the model. (Remember that the 3 identities were substituted out of the system.)

To solve the model, we select "OK", and then click on the “Solve" tab:

Notice (top, left) that we can choose between a "Deterministic" simulation and a "Stochastic" simulation. The first of these involves just solving out for the restricted reduced from equations, and setting the error terms to zero (their mean value). This will generate a single predicted time-path for each endogenous variable.

A "Stochastic" simulation, on the other hand, recognizes the presence of the error terms. Random drawings are made for the values of the error term (you can choose how many), and then many time- paths are predicted for each endogenous variable. The mean and standard deviation of these paths is computed for every endogenous variable. What you then see is the mean path and a confidence band.

I'm just going to stick with a deterministic simulation here.

You'll also see (middle, left) that we can choose between a "Dynamic Solution" of the model, and a "Static Solution". These correspond to the dynamic and static forecasts that you can generate from an OLS regression if one or more lagged values of the dependent variable appear among the regressors.

In other words, a static solution always uses the actual values of lagged endogenous variables when generating the simulation time-paths. A dynamic solution uses the predicted (simulated) values of these variables. In practice, if we were predicting beyond the end of the sample, we'd have to use a dynamic solution, after the first prediction period. A dynamic solution is more "realistic".

For the record, the "Fit" option just reproduces the within-sample predictions, equation-by-equation, ignoring the fact that the equation is actually part of a system.

When we select “OK”, we see:

If you look at the main EViews workspace, you'll see that three new variables have been created. They are CONS_0, I_0, and WP_0. These are the simulated (predicted) values of the corresponding endogenous variables.

Now I'm going to select “Proc”, “Make Graph”, and edit the window so that it looks like this:

If I select “OK”, I get a set of graphs that compares the actual series for each variable with the time-path solved out from the model:

Why are there two lines on some of the graphs and only one on others? Well, in the case of variables that are exogenous there is just the (green) line for the actual data. It's only the endogenous variables that get predicted. In the latter cases there are blue lines as well, for the simulated/predicted values.

This is like looking at a plot of "Actual" and "Fitted" values for an estimated single equation regression. However, in our case, the "fitted" values take full account of the simultaneity of the system. The (within-sample) predicted values in the graphs above are produced by the restricted reduced form of the model.

What does the word “Baseline” refer to in the legends? It reflects that the simulated time-paths from the model are based on the same data that were used to estimate the system. Nothing has been tinkered with, in contrast to what we're about to see next.

Finally, let’s simulate the effect of a simple policy change. Specifically, we're going to see what the model predicts would have happened if Government Non-Wage Spending (G) had been 5 units larger (than it actually was) in each of the years 1937 to 1941 inclusive.

What follows shows how to conduct a dynamic/deterministic simulation and compare the “policy-on” (new "scenario") results with both the “policy off” “(control”, or “baseline”) results and the actual data. You can experiment with other types of simulations.

In the Model window, when we select the “Scenarios” tab, we see:

What we need to do now is to create a second version of the variable G - one that incorporates the policy change. This will provide the information needed to simulate the model under a scenario different from the "baseline" case - called "Scenario 1", here.

To do this, we first create and highlight Scenario 1, as shown above, and press “OK”. Then, in the Model window, we select the “Variables” tab, and we see:

Next, we need to right-mouse-click on the variable “g”, and select “Properties”. We check the “override” box as shown below, and press the “Select Override = Actual” button:

A new variable, "G_1", has now been created in the Workfile. Right now, its identical to the original "G" variable, but we're about to change that.

We edit the series “G_1”by increasing each of the last five values by 5 units. (e.g., the 1941 value will now be 18.8.):

If we now solve the model, using "G_1" instead of "G", we'll be simulating a "policy-on" scenario. To do this, we select the “Solve” tab in the Model window and we see:

Selecting “OK”, gives us:

The simulation has been completed, and now we want to see the results. So, we select “Proc”, “Make Graph” and edit the window as follows:

Finally, we select “OK”, and we have the graphs:

In these graphs we're able to compare the "Baseline" simulation on the model with the "Scenario 1" solution. For each of the three endogenous variables we see that when the exogenous variable, "G", is increased for the period 1937 to 1941, the predicted time-path changes. The Baseline simulation paths are in green and the "Policy-on" (Scenario 1) are in blue.

We see that an increase in Government expenditure leads to an increase in private consumption expenditure and private wages (in the top and bottom graphs) respectively. The impact on private fixed investment is more complicated (in the middle graph).

Let's look at a "blow up" of that chart, with a colour change to make things more visible:

Now, here's an interesting question. I wonder how these simulation results, based on the FIML estimation of Klein's model, compare with the results we'd have obtained if we (wrongly) used the OLS version of the model?

We can go back to the System I previously called FIML. I'm going to pull up that system again, but this time I'm going to us OLS estimation:

I'm going to leave you to goon from here. It's just a matter of repeating the steps that we've been through already, but now we have different estimates of the parameters of the structural from of the model, and so we'll get different simulation time-paths.

Have fun!

References:

Greene, W. H., 2008. Econometric Analysis, 6th ed. Pearson Prentice Hall, Upper Saddle River, NJ.

Greene, W. H., 2012. Econometric Analysis, 7th ed. Pearson Prentice Hall, Upper Saddle River, NJ.

Klein, L. R., 1950. Economic Fluctuations in the United States. 1921-1941. Wiley, New York.

Tingbergen, J., 1959. Selected Papers. L. H. Klaassen, L. M. Koyck and J. H. Witteveen (eds.). North-Holland, Amsterdam.

42 comments:

AnonymousMay 20, 2012 at 1:20 AM
Very helpful post Dr. Giles! However, I cannot find the attached EViews workfile in the Code Page?
ReplyDelete
Replies
Trevor ZinkOctober 24, 2012 at 10:11 PM
Excellent post! I wonder if you might help me where I got stuck. In my own dataset, I have three equations and one identity (which as you said I can't explicitly include). What I did was just left the identity out, and just as you promised, when I get to the step of "converting" the system to a model, eViews seems to think there are only 3 endogenous variables where there should be 4.
My question is that oddly, the coefficient estimates are identical to those from gretl, where I can specify the endogenous variables in a system. So even though eViews thought my price variable was exogenous, it still estimated everything fine--any idea what's going on here?
ReplyDelete
Replies
Trevor ZinkNovember 1, 2012 at 4:50 PM
In case anyone stumbles across this discussion, I've learned how to 'trick' EViews into making a variable endogenous. The key is to list it first in an equation, even if it doesn't belong first in that equation. For instance, to specify that price is endogenous in a standard supply/demand system, you could write:

price*0 + demand = f*(price + x)

See the discussion here: http://forums.eviews.com/viewtopic.php?f=10&t=6986#p24759
ReplyDelete
Replies
AnonymousJanuary 19, 2013 at 9:49 PM
Prof. Giles,

When I estimate a SEM by 2SLS, IV, or GMM do I need
to have a high r-squared and check for multicollinearity
to judge that the model is any good?

Thanks.
ReplyDelete
Replies
AnonymousSeptember 26, 2013 at 2:48 PM
prof Giles,
i am running a 3sls system:
cgt= c(1)+ c(2)*nb+ c(3)*sh+ c(4)*tc+ c(5)*oil+ c(6)*o_f
nb= c(7)+ c(8)*sh+ c(9)*tc+ c(10)*o_f+ c(11)*fx
sh= c(12)+ c(13)*nb+ c(14)*tc+ c(15)*lib+ c(16)*o_f
inst tc oil o_f lib fx

how do i know which intruments to use, because my resuls are not at all what i expected.
ReplyDelete
Replies
UnknownFebruary 20, 2014 at 3:34 AM
Dear prof Giles,
thank you very much for your great insights. I have a question about the consmuption time series you used in this example. I ran a unit-root test and consumption definitively has a unit-root; however, you do not use first differences to make it stationary. Doesn't this affect the validity of the results? Thanks in advance!
ReplyDelete
Replies
UnknownMarch 24, 2014 at 1:47 PM
Dear Dr. Giles,

Thank you for this insightful post. I was hoping you could help me with a SEM model I am trying to use for simulation the fiscal/labor impact of a labor demand shock to a county in Texas.

I am trying to run a 3sls estimation in STATA for the following 14 linear equation SEM model (it is a labor/fiscal impact model known as SAFESIM) for cross-sectional labor/fiscal data for all counties in Texas. Can I put all 14 equations in to one SEM model, assuming STATA says the equations are identified and meet order conditions? I am confused as based on what I have read this model contains equations that are not autonomous (i.e. for equation 2 it makes no sense to estimate the impact of place of work employment on population, holding net-commuting fixed, as the two variables should both change in response to the labor demand shock ). Another example is equation 14, where I cannot hold fixed the property tax base per student, while estimating the impact of a change in students on state funding to school districts (because the former explanatory variable is an accounting function of the latter explanatory variable).

Additionally, I am having trouble interpreting the results if I simulate an exogenous shock of 50 new workers to the actual place of work employment. Given that these are simultaneous equations for cross-sectional data where place-of-work employment and net-commuting should change in response to the shock, I am confused what to plug in to equation 2. Would I plug in to eq. 2 for place of work employment its initial value + 50 new workers, and net-commuting would be the actual net-commuting, prior to the shock? Or would I plug in to eq. 2 for net commuting the sum of actual net commuting + the predicted change to net commuting in eq 1 resulting from the 50-unit shock to place of work employment?

1. net-commuting = f (place-of-work county employment level, county unemployment level, rural dummy)
2. population = f (place-of-work county employment, net-commuting, rural dummy)
3. civilian labor force = f (population, unemployment level, rural dummy)
4. total school-age children in county = f (population, unemployment level, hispanic population level, rural dummy)
5. total county income= f (population, county earnings, net-commuting, rural dummy)
6. retail sales & service receipts = f (total income, net-commuting, rural dummy)
7. hotel receipts == f ( total income, rural dummy)
8. mixed beverage receipts = f (total income, rural dummy)
9. total residential property value = f (total income, rural dummy
10. total commercial property value = f (total income, rural dummy)
11. intergovernmental revenue = f (population, poverty rate, rural dummy)
12. county revenue = f (property value residential, prop. val. commercial, hotel + beverage receipts, rural dummy)
13. total county expenditures = (population, total income, rural dummy)
14. total state funding to county school districts = f (total students, property tax base per student)

Thank you very much for your time.

Best,

Nabil
ReplyDelete
Replies
UnknownApril 20, 2014 at 9:53 PM
Respected Dave
your post is too informative. i need a clarification. if data have mix i.e I(0) and I(1) in this case can we apply 3sls on data?
ReplyDelete
Replies
UnknownApril 22, 2014 at 10:26 PM
Respected Dave Once again thanks for answering.If data is mix stationary i.e. I(0) and I(1) and system is simultaneous with error terms are correlated than what i can do?which estimator is proper? or how can i transform my data into the form which is suitable for 3sls?
ReplyDelete
Replies
UnknownMay 18, 2014 at 11:26 AM
Rspected Dave
when i try to estimate 3sls the e-view8 give error message "Near Singular Matrix".
my system of equation are as:-
SE=-c(5)*CEt+c(4)*SEt+(1-c(4))*(c(1)*TR+c(2)*A1+c(3)*A2)
CE=c(4)*c(1)*TR+c(4)*c(2)*A1+c(4)*c(3)*A2-c(4)*SEt+c(5)*CEt
DE=(1-c(9))*((1-c(1))*TR+(1-c(2))*A1+(1-c(3))*A2)+c(9)*DEt
TR=(1-c(1))*c(7)*(DE-(1-c(2))*A1-(1-c(3))*A2)-c(1)*c(8)*(SE-SEt)+c(6)*TRt
inst c a1 a2 set(-1) cet(-1) trt(-1) det(-1)

what are the reasons why this error appear and whats its mean. And finally how i resolve this issue.
ReplyDelete
Replies
Sakura IroNovember 27, 2015 at 5:02 AM
Good day Professor Giles,

I noticed that you didn't run any diagnostic tests for your estimations. In particular, I'm interested in the diagnostic tests for 3sls. How do you know that your model is stable, free from heterskedasticity or free from autocorrelation? I can't seem to find any option in eviews to check for these things.

Thanks in advance for your advice.
ReplyDelete
Replies
Sakura IroNovember 29, 2015 at 3:07 AM
Thanks a lot Professor Giles. I'll put your advice to good use.
ReplyDelete
Replies
UnknownApril 9, 2016 at 3:10 PM
Hi Professor Giles,
Thanks for such a great topic and sharing your expertise in econometrics among other subjects. Multiple topics have been of great assistance to me during the empirical analysis of my dissertation. I work from off campus. Thanks again for your hard work. Sean Byrne
ReplyDelete
Replies
UnknownAugust 1, 2016 at 1:42 PM
Professor Giles,
Let's say I have a system of equations with no contemporaneous endogenous variable (I only have exogenous variables and lagged endogenous variables as regressors). In that case, do I have to run the methods you dwell on (2SLS/3SLS...)?
Thanks
Romain
ReplyDelete
Replies
BSlobodaMay 28, 2017 at 5:19 PM
Professor Giles I stumbled on this website and I find your articles interesting. I am replicating Berndt and Wood (1975) for 1947-1971.

I prepared the following system Go_1=C(1)+c(2)* pk1 + c(3) * pl1 + c(4)* PE1+ c(5) *Pm1+.5*c(6)*(pk1*pk1)+c(7)*(pk1*pl1)+c(8)*(pk1+pe1)+c(9)*(pk1*pm1)+.5*c(10)*(pl1*pl1)+c(11)*(pl1*pe1)+c(12)*(pl1*pm1)+.5*c(13)*(pe1*pe1)+c(14)*(pe1*pm1)+.5*c(15)*(pm1*pm1)
@inst Pop work Excise prop gdur gndur Glabor realdu realndur cap (instruments)
K= c(16)+c(17)*pk1+c(18)*pl1+c(19)+pe1+c(20)*pm1
L= c(21)+c(22)*pk1+c(23)*pl1+c(24)+pe1+c(25)*pm1
E= c(26)+c(27)*pk1+c(28)*pl1+c(29)+pe1+c(30)*pm1
K, L, E are the market share equations. I left out the M equation. I am using 3sls and obtain a near singular matrix. I do not have same variables on both sides. I am stumped. Thanks for your inputs
ReplyDelete
Replies
SalemMay 31, 2017 at 4:30 PM
Respected prof. Pls how do i interpret the coefficient of estimates from TSLS in Eviews output?
ReplyDelete
Replies
UnknownApril 18, 2018 at 1:10 PM
Hello Sir,
Thanks alot for writing such a useful post. Howevr, i have a question about the stationarity of the time series. I am trying to estimate an equation in which dependent variable is stationary at 2nd difference while all other variables are stationary at level or 1st difference. Will it be appropriate to use 2SLS on such data? if not, which technique should i move towards.
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Pages

Saturday, May 19, 2012

Estimating & Simulating an SEM

42 comments: