Ch 20: Updating for a new setting

Justification for shrinkage to prior model: 20.4.1

A theoretical motivation for shrinkage to a previously develop model was developed by Van Houwelingen, who argued that a shrunken model can be seen as a weighted average of the fitted model and a null model without predictors in the traditional shrinkage approach (SiM 2000; SiM 2004). After recalibration, we can formulate the shrunken model as a weighted average of the fitted model and a recalibrated model. The shrunken regression coefficients βshrunk can be written as

βshrunk = βcal + s * gamma = βoverall * βorig + s * (βnew – βoverall * βorig),

where βcal is the recalibrated regression coefficient from multiplying the overall calibration slope (βoverall) with the original coefficient value (βorig); s is the shrinkage factor; and gamma is the difference in re-estimated coefficient (βnew) and recalibrated coefficient βcal. If s = 0 (severe shrinkage), the shrunken coefficient is the recalibrated value; if s = 1 (no shrinkage), the new coefficient value is retained. The value for the shrinkage factor can be determined from the increase in model performance over simple recalibration (SiM 2004).

Simulation results for updating strategies: 20.5.1

Extensive simulation studies were performed within the GUSTO-I data set. The TIMI-II model was developed in 3339 patients and updated with method 1 to 8 (Table 20.1) in validation sets of varying size (n=200, n=500, n=1000, n=2000, n=5000, n=10,000). The updated models were further tested in independent patients from GUSTO-I (SiM 2004).


Calibration-in-the-large was a problem for the validity of the TIMI model in GUSTO-I. This was solved by updating the intercept (all methods 2 to 8). The calibration slope was close to 1 for the TIMI model in GUSTO-I. Without shrinkage, updating led to an average slope below one, reflecting optimism is estimation of the updated coefficients. This miscalibration was especially severe for re-estimation of the 8 predictor model in small samples (method 5); e.g. n=200, slope=0.59; n=500, slope=0.82 on average. Shrinkage largely resolved this problem, with average slopes close to one for the re-estimated models (n=200, slope=0.93; n=500, slope=0.98). In larger validation samples, all methods led to average calibration slopes being close to 1.

Discriminative ability

Discriminative ability of the TIMI-II model was by definition not affected by the re-calibration methods 2 or 3, which leave the rank order of predictions unchanged. The c statistic was 0.785. Discrimination was adversely affected by model re-estimation or model revision when validation samples had relatively small sizes (n ≤ 500). For example, c decreased on average to 0.742 by method 5 with n=200, and to 0.771 with n=500. With shrinkage, the decrease was less (to 0.775 for n=200). An improvement in average c (by at most 0.01) was only seen with validation samples of at least 1000 patients, and with updating including shrinkage. Without shrinkage, c did not improve with re-estimation unless n>=2000. Further details are presented elsewhere

Updating and sample size

Updating is expected to be more beneficial if the previous model is based on a small sample size. We hereto simulated the situation that the TIMI-II model was developed in n=500 instead of n=3339. The average slope of the linear predictor was then around 0.8, reflecting a need for shrinkage, consistent with the small development sample size. Updating of the slope (method 3) solved this problem. The discriminative ability was also hampered by the smaller size of the development data set (average c around 0.75 for methods 1 to 3, in contrast to 0.785 for the original TIMI-II model). Still, re-estimation with n=200 led to a lower c statistic (0.742). A more satisfactory performance was obtained with methods 5, 7, or 8 for validation sample sizes n ≥ 500, especially when combined with shrinkage.

IPI case study: Fig 20.7

Here is the R code for Fig 20.7

IPIdata <-

matrix(data=c(1, 2, 0.84,

2, 2, 0.66,

3, 2, 0.54,

4, 2, 0.34,

1, 5, 0.73,

2, 5, 0.51,

3, 5, 0.43,

4, 5, 0.26),

nrow=8, ncol=3, byrow=T,


# Make a factor as in Van Houwelingen paper, reference is IPI==4

IPIdata$IPIr <- as.factor(4 - IPIdata$IPI)

fit <- ols(log(-log(surv))~IPIr+ log(tfup), data=IPIdata)

# Exactly as published in Stat Med 2000; 19; page 3404

# Fig 20.7 #

IPIpred <- predict(fit, cbind(c(rep(1,101),rep(2,101),rep(3,101),rep(4,101)), rep(seq(0,10,0.1),4)))

plot(x=seq(0,10,0.1), y=exp(-exp(IPIpred))[1:101], axes='n', type='l', lty=4,lwd=3, ylim=c(0,1),col=mycolors[7])

axis(side=1, at=c(0,2,5,10))

mtext('Time (years)', cex=1.2, side=1, line=2.5)

axis(side=2, las=1)

mtext('Fraction surviving', cex=1.2, side=2, line=2.5)

lines(seq(0,10,0.1), y=exp(-exp(IPIpred))[102:202], lty=3,lwd=3,col=mycolors[2])

lines(seq(0,10,0.1), y=exp(-exp(IPIpred))[203:303], lty=2,lwd=3,col=mycolors[3])

lines(seq(0,10,0.1), y=exp(-exp(IPIpred))[304:404], lty=1,lwd=3,col=mycolors[4])

text(x=rep(8,4),y=c(.75,.5,.3,.1), c("IPI=1","IPI=2","IPI=3","IPI=4"), col=mycolors[c(4,3,2,7)])

points(x=IPIdata[c(1,5),2], y=IPIdata[c(1,5),3], pch=0, cex=1.5, col=mycolors[4], lwd=2)

points(x=IPIdata[c(1,5),2], y=IPIdata[c(1+1,5+1),3], pch=1, cex=1.5, col=mycolors[3], lwd=2)

points(x=IPIdata[c(1,5),2], y=IPIdata[c(1+2,5+2),3], pch=3, cex=1.5, col=mycolors[2], lwd=2)

points(x=IPIdata[c(1,5),2], y=IPIdata[c(1+3,5+3),3], pch=8, cex=1.5, col=mycolors[7], lwd=2)

# End Fig 20.7 #

Parametric recalibration: 20.7.5

Instead of recalibration with a Cox regression model, Van Houwelingen also describes how we can use an exponential model or a Weibull model |(Stat Med 2000;19:3401-15). The baseline cumulative hazard function of the IPI-Weibull model is defined as:

H0, model (t) = –ln(S0, model (t)) = exp(–0.319 + 0.439 * ln(t) ) = 0.727 * t 0.439.

We use this transformation for the time T in the validation sample:

T* = 0.727 * T 0.439. This transformed time T* follows an exponential distribution if the IPI -Weibull model is valid. An assessment of calibration-in-the-large is possible in various ways. We can use an exponential survival model with log(T*) = α + PI,
where α refers to a constant that controls the level of the log(hazard), adjusted for the IPI effects in the prognostic index (PI, based on the IPI coefficients as defined in section 20.7.4).
A simple alternative is to directly compare the number of observed deaths to the number predicted (correcting for censoring): α = 0.436 (SE 0.06). This is equivalent to a hazard ratio of exp(0.436) = 1.55. Hence, the overall survival was 1.55 times worse in the validation sample than in the development sample, adjusted for IPI score.

Recalibration can also be assessed with an exponential model for with the PI as the single predictor:

log(T*) = α + β * PI.
Some may prefer recalibration with a Weibull model, which allows for a different shape of the baseline hazard in the validation sample:

log(T*) = α + β * PI + γ * e.

Here α refers to a constant that controls the level of the log(hazard), β refers to the effect of the prognostic index PI, based on the IPI coefficients, and γ controls the shape of the hazard function, and e indicates the exponential distribution. If γ = 1, the shape of the baseline hazard function is maintained in the validation data. If α = 0, β = –1, and γ = 1, the calibration is perfect. The ratio of –β and γ (– β / γ) is the calibration slope. Recalibration of the IPI-Weibull model results in estimates of α= –0.24 (SE 0.06), β = –0.68 (SE 0.07), and γ =0.65 (SE 0.03) 456. Since γ is clearly different from 1, the shape of the hazard function is different in the validation data than estimated with the IPI-Weibull model. We hence cannot simply adjust the baseline hazard from the IPI-Weibull model by a constant factor, such as an hazard ratio of 1.55. This is consistent with the finding of different values for α when recalibration is done by time points (section 20.7.3).

In the proportional hazards interpretation of the Weibull model the calibration slope for the linear predictor is –β / γ = 0.68/0.65=1.04. This is in line with the Cox recalibration model (see 20.7.4), which also indicated that the predictive effect of the IPI was remarkably similar in the validation setting.

The Weibull recalibration procedure updates the IPI survival predictions by estimating 3 parameters. The parameters α and γ are used to update the baseline survival; β is used to recalibrate the IPI effect. It is a parsimonious (and hence attractive) alternative to recalibration with a Cox model.

More extensive model updating is obviously possible. For example we can reweight the 5 components of the IPI (method 4 in Table 20.1). This is actually similar to e.g. reweighting components of a comorbidity summary score as discussed in Chapter 9. The predictor Age>60 had a statistically significant stronger effect than assumed in the IPI. We can also update a model with inclusion of non-proportionality of the hazards between prognostic groups. For further details see papers by Van Houwelingen (SiM 1995; SiM 2000); Royston (2013).