Chapter 12: Interaction terms

Smart coding of interaction terms

Binary predictors

Categorized predictors

Continuous predictors

Illustration in GUSTO-I

Age 50 or 70

Prevention of counterintuitive patterns

HRT effect at higher age only

Penalized estimation of interaction terms

Hepatitis B virus (HBV)

Smart coding of interaction terms

We may consider some ways of smart coding of interaction terms

Binary predictors

The smart coding of interaction terms considers different formulations of the standard terms in a regression model. The model for testing of interaction is y ~ x1 + x2 + x1*x2

This model has 3 df, including 1 df for the interaction x1*x2 (if x1 and x2 both have 1 df). If x1 is a binary variable, we can code x2 such that it represents the effects without x2 and with x2 respectively:

y ~ x1 + (1 – x1)*x2 + x1*x2

This model also has 3 df, but now the effects are directly estimated for x2 in the absence of x1 and the presence of x1.

The term (1 – x1)*x2 is the x2 effect if x1=0; the term x1*x2 is the x2 effect if x1=1.

Categorized predictors

Suppose we consider a categorical predictor with 7 levels, e.g. presenting problem in children with fever ('problem'). We are interested in the effect of temperature ('temp') on urgency (high vs low). The model for testing of interaction is

y ~ as.factor(problem) + temp + as.factor(problem) * temp

This model has 13 df of freedom. For easier interpretation of the effect of temp by problem we create 7 variables:

p1t <- ifelse('problem'==1, temp, 0)
p2t <- ifelse('problem'==2, temp, 0)

...

p7t <- ifelse('problem'==7, temp, 0)

We then fit the model

y ~ as.factor(problem) + p1t + p2t + ... + p7t

This model also has 13 df, but the effects of temperature are easier to interpret.

Continuous predictors

If x1 is a continuous predictor, we can estimate the effect of x2 at specific values of x1 by subtracting the x1 value at which we want to estimate the effect of x2. For example, when we examine a model with interaction between systolic blood pressure (SBP, range e.g. 60 – 200) and age (AGE, range e.g. 30 – 90 years), we could reformulate the standard model y ~ AGE + SBP + AGE*SBP :

y ~ (AGE-50) + SBP + (AGE-50)*SBP for the SBP effect at age 50, and
y ~ (AGE-70) + SBP + (AGE-70)*SBP for the SBP effect at age 70.

Illustration in GUSTO-I

We illustrate the smart coding in the n=785 subsample of GUSTO-I. We want to assess the effects of age among those with or without tachycardia (HRT 0 / 1). We calculate 2 age variables AGE0 and AGE1 to do so:

AGE0 <- gustos$AGE * (1 - gustos$HRT) # Age effect, No tachycardia
AGE1 <- gustos$AGE * gustos$HRT # Age effect, Tachycardia

The standard fit was

Coef S.E. Wald P

AGE 0.0759 0.024 3.16 0.0016
HRT=Tachycardia -3.637 2.835 -1.28 0.1995
AGE * HRT=Tachycardia 0.0655 0.040 1.64 0.1004
...

The smart coding gives us

Coef S.E. Wald P

AGE0 0.0759 0.024 3.16 0.0016
AGE1 0.1414 0.033 4.26 0.0000
HRT=Tachycardia -3.6376 2.835 -1.28 0.1995

...

In the standard fit, we have to add the effects of AGE and AGE*HRT to obtain the age effect in those with HRT. A confidence interval for the age effect in those with HRT requires use of the covariance matrix of the fit. With smart coding, the confidence intervals are obtained directly from the fit from the SE of each age effect. Note that the coefficient for AGE1 is the sum of the AGE and AGE*HRT effects in the standard fit.

Age 50 or 70

The interpretation of the HRT effect is at age zero; we can make that at age 50 as follows:

AGE0 <- (gustos$AGE-50) * (1-gustos$HRT)
AGE1 <- (gustos$AGE-50) * gustos$HRT

This coding leads to the following regression coefficients

Coef S.E. Wald P

AGE0 0.07587 0.02401 3.16 0.0016

AGE1 0.14140 0.03323 4.26 0.0000

HRT -0.3609 0.88541 -0.41 0.6835

...

For age 70, we subtract 70 instead of 50 from the age variable, and this leads to

Coef S.E. Wald P

Intercept -4.4323 0.52548 -8.43 0.0000

AGE0 0.07587 0.02401 3.16 0.0016

AGE1 0.14140 0.03323 4.26 0.0000

HRT 0.94971 0.33094 2.87 0.0041

...

So, we note that HRT is slightly protective at age 50 (coefficient -0.36, p=0.68) and a risk factor at age 70 (coefficient 0.95, p=0.004, Fig 12.1). Fig 12.1 also holds a warning against attempts for refined modeling with small data; the interaction term AGE*HRT has a reversed effect, if anything, in the full GUSTO-I data set (n=40,830).

Prevention of counterintuitive patterns

We can reformulate the interaction model to prevent this counterintuitive pattern.

AGE55min <- ifelse(gustos$AGE<55, gustos$AGE-55,0)

AGE55plusNoHRT <- ifelse(gustos$AGE<55|gustos$HRT==0,0,gustos$AGE-55)

AGE55plusHRT <- ifelse(gustos$AGE<55|gustos$HRT==1,0,gustos$AGE-55)

This coding leads to the following regression coefficients

Coef S.E. Wald P

AGE55min 0.08709 0.10706 0.81 0.416

AGE55plusNoHRT 0.07701 0.02491 3.09 0.002

AGE55plusHRT 0.14123 0.02577 5.48 0.000

HRT effect at higher age only

We now estimate a common age effect until age 55 (AGE55min coefficient 0.087 per year); from age 55 the age effect is 0.141 and 0.077 for those with and without tachycardia respectively. We do not incorporate a separate effect of HRT, since we assume a breakpoint at age 55. If we only assume a stronger effect over age 55, we can formulate a model with the following coefficients:

Coef S.E. Wald P

AGE55 0.07817 0.02097 3.73 0.0002

AGE55plusHRT 0.06409 0.01871 3.43 0.0006

...

where AGE55 = gustos$AGE-55 # age is zero at 55 years

We now have an age effect for the HRT=1 (‘Tachycardia’) group that starts at age 55, and is +0.064 stronger than the effect of 0.078 for the HRT=0 (‘No tachycardia’) group. Dividing age by 10 would make the coefficient interpretable as per decade older (Fig 12.2)

Penalized estimation of interaction terms

A noted in Fig 12.1, assessment of interactions may lead to a large number of formal and informal tests, and iterative adaptations in a prediction model. The model will then likely be overfitted to the data. Shrinkage techniques, such as penalized maximum likelihood estimation, may reduce the regression coefficients of selected interaction terms such that better predictions are obtained for future patients.

Hepatitis B virus (HBV)

Patients with chronic hepatitis B virus (HBV) infection can be treated with a specific type of interferon (with peginterferon alpha-2b). This leads to response (“HBeAg loss at 26 weeks”) in about a third of the patients. Buster et al aimed to predict response to treatment in individual patients with readily available patient characteristics.

A main effects model was considered with 5 predictors, including a predictor indicating the genotype (A, B, C, D). The researchers were specifically interested in the interaction between genotype (4 groups) and other predictors. When we try perform an overall test for interaction, we encounter a zero cell problem for one interaction term. An overall test for genotype by predictor interaction for 3 of the 4 other predictors is statistically non-significant (p=0.48). But an interaction between genotype and a liver enzyme (“ALT”) is statistically significant if considered in isolation.

Interactions

The effect of the genotype interaction is that ALT has no effect in genotype A and D, a small effect in genotype C, and a strong effect in genotype B (Fig 12.E1).

Fig 12.E1: Interaction effects in a prediction model for response in HBV patients, with standard and penalized maximum likelihood estimation (see Chapter 13). We note that the ALT effect is especially strong in genotype B patients, but that this effect is attenuated with penalized ML

Estimation

For estimation, we further consider a penalized maximum likelihood procedure, with separate penalties for main effects and interactions (see Chap 13). This leads to an interesting pattern for the optimal penalty combination. The expected future performance (according to Akaike’s Information Criterion, AIC) is better if interaction terms are penalized more than the main effects (Fig 12.E2).

The optimal combination is a penalty of 0.2 for main effects and 1.2 for interaction terms. Since the interaction was actually selected from a larger number of interaction terms, we might also search for the optimal penalty combination in a model with 3 interactions. This appears to lead to the combination 0, 1.4, implying no penalty for the main effects. An intermediate choice for the model with 1 interaction is to use the penalty combination 0.1, 1.4. This leads to slightly less extreme estimates for the interaction effects, especially for the ALT effect among genotype B patients (Fig 12.E1).

Fig 12.E2: Performance according to AIC for combinations of penalty factors for main effects and an interaction term in the prediction model for response in HBV patients. Better performance is indicated by a larger triangle. The largest triangle is found for the combination 0.2, 1.2 in the model with one interaction term for genotype*ALT

Performance

For performance, we consider the c statistics for the main effect model, and for the 2 variants of the interaction model (standard and penalized ML, Table 12.E1). The main effects model has an apparent c statistic of 0.69. With bootstrap validation, the c statistic becomes 0.65. The decrease by 0.04 is substantial, which is explained by the ratio of events per variable (84 events, 5 predictors with 7 df). When we extend the model with the interaction between genotype and ALT, the apparent c increases to 0.72. But the optimism is 0.05, making the model with interaction only slightly better than a main effects model. Using penalized ML led to very similar results, with a slightly lower apparent c statistics and slightly higher validated c statistics. The validation did however not include the search for potential interaction terms, and the search for a reasonable penalty factor. The final model can be presented as a nomogram, which allows for clear insight in the effect of the interaction between genotype and ALT (Fig 12.E3, see Chap 18).

Fig 12.E3 Nomogram presentation of the prediction model for response in HBV patients with penalized estimation of the interaction between ALT and Genotype. For calculation of the probability of response: Determine each predictor value, and draw a straight line upwards to the ‘Points’ axis to determine how many points towards response the patient receives. Sum the points received for each predictor and locate this sum on the ‘Total Points’ axis. Draw a straight line down to find the patient’s predicted probability of response; if the Total Points are less than 3, the probability of response is less than 20%; if over 10, the probability is over 90%.

R code

The essential R code is as follows for penalized estimation of interaction in hepatitis B virus response prediction example.

library(rms)# Fit a model with 3 interaction term in HBVs data set; 5 predictorsfulli2 <- lrm(HBEAG78X ~ as.factor(Genotype)*(HBVload+ALT+GGT)+PreviousTx, data=HBVs)# Overall test for interactionanova(fulli2)# Fit a model with 1 interaction term in HBVs data set; 5 predictorsfulli <- lrm(HBEAG78X ~ as.factor(Genotype)*ALT + HBVload+GGT+PreviousTx, data=HBVs) # Penalized ML; search penalty factors# 1 penalty; optimal value 0.3p <- pentrace(fulli, c(0,.2,.25, .3,.35,.4,.6,.8,1.2,2,4,10)) # separate penalty for interaction term; optimal combination 0.2, 1.2p2 <- pentrace(fulli, list(simple=c(0,.1,.2,.3),interaction=c(0,.4,.8,1,1.2,1.4,1.6)))# plot penalty combinationsplot(p2, which='aic.c') # Fig 12.E2# separate penalty for 3 interaction terms; optimal combination 0, 1.4p3 <- pentrace(fulli2, list(simple=c(0,0.01,0.02), interaction=c(0,1,1.2,1.4,1.6,2,4)))# manually change penalty to combination 0.1, 1.4choice.pen <- p2$penalty choice.pen[1:2] <- c(0.1,1.4)
# fit penalized logistic regression modelfulli.pen <- update(fulli, penalty=choice.pen)
# plot effects of interaction Fig 12.E1plot(fulli.pen, ALT=NA, Genotype=NA)
# present as nomogram Fig 12.E3nomogram(fulli.pen, fun=plogis, lp=F,lp.at=c(-2,0,2,4), ALT=c(2:6), fun.at=seq(.1,.9,by=.1), funlabel="p(response)", vnames="lab", maxscale=5)

Page updated

Google Sites

Report abuse