We may consider some ways of smart coding of interaction terms
The smart coding of interaction terms considers different formulations of the standard terms in a regression model. The model for testing of interaction is y ~ x1 + x2 + x1*x2
This model has 3 df, including 1 df for the interaction x1*x2 (if x1 and x2 both have 1 df). If x1 is a binary variable, we can code x2 such that it represents the effects without x2 and with x2 respectively:
y ~ x1 + (1 – x1)*x2 + x1*x2
This model also has 3 df, but now the effects are directly estimated for x2 in the absence of x1 and the presence of x1.
The term (1 – x1)*x2 is the x2 effect if x1=0; the term x1*x2 is the x2 effect if x1=1.
Suppose we consider a categorical predictor with 7 levels, e.g. presenting problem in children with fever ('problem'). We are interested in the effect of temperature ('temp') on urgency (high vs low). The model for testing of interaction is
y ~ as.factor(problem) + temp + as.factor(problem) * temp
This model has 13 df of freedom. For easier interpretation of the effect of temp by problem we create 7 variables:
p1t <- ifelse('problem'==1, temp, 0)
p2t <- ifelse('problem'==2, temp, 0)
...
p7t <- ifelse('problem'==7, temp, 0)
We then fit the model
y ~ as.factor(problem) + p1t + p2t + ... + p7t
This model also has 13 df, but the effects of temperature are easier to interpret.
If x1 is a continuous predictor, we can estimate the effect of x2 at specific values of x1 by subtracting the x1 value at which we want to estimate the effect of x2. For example, when we examine a model with interaction between systolic blood pressure (SBP, range e.g. 60 – 200) and age (AGE, range e.g. 30 – 90 years), we could reformulate the standard model y ~ AGE + SBP + AGE*SBP :
y ~ (AGE-50) + SBP + (AGE-50)*SBP for the SBP effect at age 50, and
y ~ (AGE-70) + SBP + (AGE-70)*SBP for the SBP effect at age 70.
We illustrate the smart coding in the n=785 subsample of GUSTO-I. We want to assess the effects of age among those with or without tachycardia (HRT 0 / 1). We calculate 2 age variables AGE0 and AGE1 to do so:
AGE0 <- gustos$AGE * (1 - gustos$HRT) # Age effect, No tachycardia
AGE1 <- gustos$AGE * gustos$HRT # Age effect, Tachycardia
The standard fit was
Coef S.E. Wald P
AGE 0.0759 0.024 3.16 0.0016
HRT=Tachycardia -3.637 2.835 -1.28 0.1995
AGE * HRT=Tachycardia 0.0655 0.040 1.64 0.1004
...
The smart coding gives us
Coef S.E. Wald P
AGE0 0.0759 0.024 3.16 0.0016
AGE1 0.1414 0.033 4.26 0.0000
HRT=Tachycardia -3.6376 2.835 -1.28 0.1995
...
In the standard fit, we have to add the effects of AGE and AGE*HRT to obtain the age effect in those with HRT. A confidence interval for the age effect in those with HRT requires use of the covariance matrix of the fit. With smart coding, the confidence intervals are obtained directly from the fit from the SE of each age effect. Note that the coefficient for AGE1 is the sum of the AGE and AGE*HRT effects in the standard fit.
The interpretation of the HRT effect is at age zero; we can make that at age 50 as follows:
AGE0 <- (gustos$AGE-50) * (1-gustos$HRT)
AGE1 <- (gustos$AGE-50) * gustos$HRT
This coding leads to the following regression coefficients
Coef S.E. Wald P
AGE0 0.07587 0.02401 3.16 0.0016
AGE1 0.14140 0.03323 4.26 0.0000
HRT -0.3609 0.88541 -0.41 0.6835
...
For age 70, we subtract 70 instead of 50 from the age variable, and this leads to
Coef S.E. Wald P
Intercept -4.4323 0.52548 -8.43 0.0000
AGE0 0.07587 0.02401 3.16 0.0016
AGE1 0.14140 0.03323 4.26 0.0000
HRT 0.94971 0.33094 2.87 0.0041
...
So, we note that HRT is slightly protective at age 50 (coefficient -0.36, p=0.68) and a risk factor at age 70 (coefficient 0.95, p=0.004, Fig 12.1). Fig 12.1 also holds a warning against attempts for refined modeling with small data; the interaction term AGE*HRT has a reversed effect, if anything, in the full GUSTO-I data set (n=40,830).
We can reformulate the interaction model to prevent this counterintuitive pattern.
AGE55min <- ifelse(gustos$AGE<55, gustos$AGE-55,0)
AGE55plusNoHRT <- ifelse(gustos$AGE<55|gustos$HRT==0,0,gustos$AGE-55)
AGE55plusHRT <- ifelse(gustos$AGE<55|gustos$HRT==1,0,gustos$AGE-55)
This coding leads to the following regression coefficients
Coef S.E. Wald P
AGE55min 0.08709 0.10706 0.81 0.416
AGE55plusNoHRT 0.07701 0.02491 3.09 0.002
AGE55plusHRT 0.14123 0.02577 5.48 0.000
We now estimate a common age effect until age 55 (AGE55min coefficient 0.087 per year); from age 55 the age effect is 0.141 and 0.077 for those with and without tachycardia respectively. We do not incorporate a separate effect of HRT, since we assume a breakpoint at age 55. If we only assume a stronger effect over age 55, we can formulate a model with the following coefficients:
Coef S.E. Wald P
AGE55 0.07817 0.02097 3.73 0.0002
AGE55plusHRT 0.06409 0.01871 3.43 0.0006
...
where AGE55 = gustos$AGE-55 # age is zero at 55 years
We now have an age effect for the HRT=1 (‘Tachycardia’) group that starts at age 55, and is +0.064 stronger than the effect of 0.078 for the HRT=0 (‘No tachycardia’) group. Dividing age by 10 would make the coefficient interpretable as per decade older (Fig 12.2)
A noted in Fig 12.1, assessment of interactions may lead to a large number of formal and informal tests, and iterative adaptations in a prediction model. The model will then likely be overfitted to the data. Shrinkage techniques, such as penalized maximum likelihood estimation, may reduce the regression coefficients of selected interaction terms such that better predictions are obtained for future patients.
Patients with chronic hepatitis B virus (HBV) infection can be treated with a specific type of interferon (with peginterferon alpha-2b). This leads to response (“HBeAg loss at 26 weeks”) in about a third of the patients. Buster et al aimed to predict response to treatment in individual patients with readily available patient characteristics.
A main effects model was considered with 5 predictors, including a predictor indicating the genotype (A, B, C, D). The researchers were specifically interested in the interaction between genotype (4 groups) and other predictors. When we try perform an overall test for interaction, we encounter a zero cell problem for one interaction term. An overall test for genotype by predictor interaction for 3 of the 4 other predictors is statistically non-significant (p=0.48). But an interaction between genotype and a liver enzyme (“ALT”) is statistically significant if considered in isolation.
The effect of the genotype interaction is that ALT has no effect in genotype A and D, a small effect in genotype C, and a strong effect in genotype B (Fig 12.E1).
Fig 12.E1: Interaction effects in a prediction model for response in HBV patients, with standard and penalized maximum likelihood estimation (see Chapter 13). We note that the ALT effect is especially strong in genotype B patients, but that this effect is attenuated with penalized ML
For estimation, we further consider a penalized maximum likelihood procedure, with separate penalties for main effects and interactions (see Chap 13). This leads to an interesting pattern for the optimal penalty combination. The expected future performance (according to Akaike’s Information Criterion, AIC) is better if interaction terms are penalized more than the main effects (Fig 12.E2).
The optimal combination is a penalty of 0.2 for main effects and 1.2 for interaction terms. Since the interaction was actually selected from a larger number of interaction terms, we might also search for the optimal penalty combination in a model with 3 interactions. This appears to lead to the combination 0, 1.4, implying no penalty for the main effects. An intermediate choice for the model with 1 interaction is to use the penalty combination 0.1, 1.4. This leads to slightly less extreme estimates for the interaction effects, especially for the ALT effect among genotype B patients (Fig 12.E1).
For performance, we consider the c statistics for the main effect model, and for the 2 variants of the interaction model (standard and penalized ML, Table 12.E1). The main effects model has an apparent c statistic of 0.69. With bootstrap validation, the c statistic becomes 0.65. The decrease by 0.04 is substantial, which is explained by the ratio of events per variable (84 events, 5 predictors with 7 df). When we extend the model with the interaction between genotype and ALT, the apparent c increases to 0.72. But the optimism is 0.05, making the model with interaction only slightly better than a main effects model. Using penalized ML led to very similar results, with a slightly lower apparent c statistics and slightly higher validated c statistics. The validation did however not include the search for potential interaction terms, and the search for a reasonable penalty factor. The final model can be presented as a nomogram, which allows for clear insight in the effect of the interaction between genotype and ALT (Fig 12.E3, see Chap 18).
Fig 12.E3 Nomogram presentation of the prediction model for response in HBV patients with penalized estimation of the interaction between ALT and Genotype. For calculation of the probability of response: Determine each predictor value, and draw a straight line upwards to the ‘Points’ axis to determine how many points towards response the patient receives. Sum the points received for each predictor and locate this sum on the ‘Total Points’ axis. Draw a straight line down to find the patient’s predicted probability of response; if the Total Points are less than 3, the probability of response is less than 20%; if over 10, the probability is over 90%.
The essential R code is as follows for penalized estimation of interaction in hepatitis B virus response prediction example.
library(rms)# Fit a model with 3 interaction term in HBVs data set; 5 predictorsfulli2 <- lrm(HBEAG78X ~ as.factor(Genotype)*(HBVload+ALT+GGT)+PreviousTx, data=HBVs)# Overall test for interactionanova(fulli2)# Fit a model with 1 interaction term in HBVs data set; 5 predictorsfulli <- lrm(HBEAG78X ~ as.factor(Genotype)*ALT + HBVload+GGT+PreviousTx, data=HBVs) # Penalized ML; search penalty factors# 1 penalty; optimal value 0.3p <- pentrace(fulli, c(0,.2,.25, .3,.35,.4,.6,.8,1.2,2,4,10)) # separate penalty for interaction term; optimal combination 0.2, 1.2p2 <- pentrace(fulli, list(simple=c(0,.1,.2,.3),interaction=c(0,.4,.8,1,1.2,1.4,1.6)))# plot penalty combinationsplot(p2, which='aic.c') # Fig 12.E2# separate penalty for 3 interaction terms; optimal combination 0, 1.4p3 <- pentrace(fulli2, list(simple=c(0,0.01,0.02), interaction=c(0,1,1.2,1.4,1.6,2,4)))# manually change penalty to combination 0.1, 1.4choice.pen <- p2$penalty choice.pen[1:2] <- c(0.1,1.4)