# Grad Fellow Notes: Interaction Terms in STATA

Interaction Terms in STATA

Tommie Thompson: Georgetown MPP 2018

In regression analysis, it is often useful to include an interaction term between different variables. For instance, when testing how education and race affect wage, we might want to know if educating minorities leads to a better wage boost than educating Caucasians. It’s possible that minority wages rises higher for every additional “unit” of education than it does for whites. To consider an interaction term, we simply create a new variable with the two terms multiplied together:

Wage = β0 + β1Education + β2Minority + β3Education*Minority + ε

β3 tells us the effect of education on hourly wage by race. If β3 > 0, then minorities earn more per hour than Caucasians for every additional unit of education they receive, controlling for the other predictors. This doesn’t mean that minorities have higher wages than whites (β2 tells us that), but that minorities derive more wage-generating value from education than whites.

Conducting analysis with interaction terms is straightforward in Stata. The most intuitive way to do so is to generate the interaction term as a new variable:

(2 missing values generated)

. reg wage grade i.race RacexEduc

The output suggests that minorities gain 15 cents more per hour than whites for every additional year of education they receive, ceteris paribus, even though minorities make \$2.47 less per hour than whites overall. Although the coding for this output is relatively painless, Stata offer a quicker way to run models with interaction terms using hashtags:

As the figure shows, if one hashtag is used, Stata runs a model only with the interaction term. That is:

Wage = β0 + β1Education*Minority + ε

Running a model like this however, is generally ill-advised. If we only include the interaction term without the main effects, then the observed effect of the interaction term might be masking the true effect from one of the main predictors. In other words, some of the effect we see from the interaction term may be from an independent main predictor “hiding” in the interaction term. But if we include the main effects, then we can see the pure relationship between wages and the interaction of education and minority status, since the model will hold the main effects constant in calculating the interaction coefficient. To include the main effects using hashtags, we can write them in as -reg wage grade i.race i.race#c.grade-. However, a simpler way is to use two hashtags: