Chapter 18: Presentation formats

Construction of score chart

1. Multiply and round regression coefficients of binary predictors and dummy variables of categorical predictors

2. Search for score for continuous predictors

3. Estimate the common multiplication factor

4. Estimate the intercept, check c statistic, and present as score chart

R script for testicular cancer case study

Derive a table with risk estimates (Br J Cancer 1996)

Graph for impact of mass size (Radiology 2000)

Meta-model with tree presentation (Int J Cancer 1999)

Construction of score chart

Below we discuss the construction of the score chart.

1. Multiply and round regression coefficients of binary predictors and dummy variables of categorical predictors

The first step is to multiply regression coefficients and round them to scores. A simple approach is to multiply coefficients by 10. But we can also search for smaller rounded scores. For example, the coefficients of the binary predictors Teratoma, Pre.AFP, and Pre.HCG were quite similar (approximately 0.8 for the penalized coefficients, Table 18.4). We multiply by 10/8 to give these 3 predictors each a score of 1. In general, we can often find lower numbers for multiplication that still allow for a refined prediction.

The R command for a search of scaling factor for the penalized coefficients was:

for (i in seq(1, 10, by=0.5)) {
cat("Multiply by:", 10/i, "i=", i, "Coefs:", round(full.pen$coefficients[-1] * 10 / i), "\n") }

2. Search for score for continuous predictors

Three continuous predictors are considered in the prediction model: LDH, size, and reduction. We decided for a log transformation (LDHst), square root transformation (Post.size), and linear (Reduction in size during treatment). These continuous predictors need to be rescaled in such a way that a 1 point in score corresponds approximately to a 10/8 increase in log odds.

We first treat these predictors as continuous variables, and later consider categorization as a further simplification. We go through several steps: a) score 0: convenient? With a predictor such as age, it is often strange to have a score of 0 at 0 years. We may need to change the reference point to a sensible value, which we subtract from the original value.
Score points: We aim to find values of the predictors where the scores are +1, +2, etc points, depending on the distribution of predictor values and the predictor effects.
Intermediate scores: We have to think about intermediate values, which have e.g. a score of +0.5 points. Other values can then be scored by linear interpolation.

We consider these 3 steps for the 3 continuous predictors in the testicular cancer histology prediction model (LDH, postchemotherapy size, and reduction in size).

log(LDHst)

A score of zero is obtained for log(1), i.e. when LDH values are equal to the upper limit of normal. This is convenient as a reference. The log transformed variable happens to have a penalized coefficient of 0.88, which can be rounded directly to 1 point when multiplied by 10/8 (0.88 * 10/8 = 1.1). Hence, when the natural log = 1, the score is one point. The LDHst value then is exp(1) = 2.7. So a LDH values 2.7 times the upper limit of normal have a score of 1. The score of 1.1 was actually somewhat larger than 1. Hence, we can set 1 point at 2.5 times normal, and a score of zero at 1 times normal levels. A score of 2 points is achieved at 2.5*2.5 is approximately 6 times normal levels. A more general approach is to study the relation between the score and the corresponding LDH values. We calculate the score (rounded at 1 decimal) for the LDHst values. We note that intermediate levels of LDH are common; the median value was 1.36 times normal. A score of +0.5 is found at 1.6 times normal LDH; we show this value in the score chart. A score of +1.5 is found at 4 times normal LDH. Scores to be shown for LDHst are 0.6, 1, 1.6, 2.5, 4, and 6 times upper limit of normal, with scores of -0.5, 0, 0.5, 1, 1.5, and 2 respectively.

sqrt(size)

In the nomogram, a postchemotherapy size of 50 mm obtains a score of zero, such that smaller sizes have positive points. In the score chart we can choose a similar approach, but prefer a size of 10 mm corresponding to zero. The 10 mm size is clinically the most interesting, since many centers do not perform surgery for patients with masses smaller than 10 mm. Masses smaller than 10 mm are considered ‘normal size’ by radiologists. We cannot define Post.size with negative numbers however, since we want to study the square root of Post.size. Hence, we subtract – 10mm from Post.size only after the model was fitted. Based on the figure below, we choose the following sizes to be shown with scores: <5mm, +0.5; 10mm, 0; 20mm, –0.5; 40mm, –1, and 70mm, –1.5 point.

I(reduction)

A reduction of 0% is associated with 0 points, which is convenient. The relationship is simply linear (see figure below). However, in the data we noted that none of the patients had necrosis when there was an increase in mass size during chemotherapy. We choose the following points: <0%: –1, 0%, 0; 50%, 1; 100%, 2 points.

3. Estimate the common multiplication factor

After finding a suitable set of weights we need to find the multiplication factor for the scores. In the testicular cancer model we round after multiplication with the factor 10/8 in a logistic model. To compensate for this multiplication, the common multiplication factor was found to be 0.86 in a fit with this rescaled linear predictor. This factor can be multiplied with the shrinkage factor (in this case: 0.95) to obtain shrunk predictions (shrunk.beta=0.81).

The calculation is as follows , where we omit the intercept from the set of coefficients:

score.fit <- lrm(NEC~Teratoma+Pre.AFP+Pre.HCG+LDHr+SQPOSTr+REDUC5, data=n544,x=T,y=T)
rounded.lp <- score.fit$x %*% rep(1,6) # All scores a weight of 1

# multiplier makes the rescaled factors for logistic formula
multiplier <- lrm.fit(y=score.fit$y, x=rounded.lp)$coef[2] # this finds: 0.86
shrunk.beta <- 0.95 * multiplier # shrinkage * 0.86
shrunk.beta # shrunk multiplier for better predictions, 0.81

4. Estimate the intercept, check c statistic, and present as score chart

We estimate the intercept corresponding to the scores, using the rounded coefficients multiplied with the shrunk.beta coefficient as an offset variable:

lrm.fit(y=score.fit$y, offset=shrunk.beta*rounded.lp) # The formula is: lp = –1.94 + 0.81*score.

We check the deterioration in discriminative performance. The c statistic of the original model was 0.839; uniform shrinkage does not affect this value. We find that the c statistic with rounded scores and using continuous predictors is only 0.001 lower (0.838).

The final score chart can be constructed in several ways. Especially, the presentation of values for continuous predictors is possible with scores horizontally, or vertically. A more general approach is to use the rms::nomogram() function

R script for testicular cancer case study

Several variants to present the testicular cancer model are shown in this R script, starting with the n544 data set. Enjoy!

Derive a table with risk estimates (Br J Cancer 1996)

## make a score from 0 - 5 for a simple table

## and leave postsize on one axis; exclude increase in size (low p(nec))

score5 <- n544$Teratoma+n544$Pre.AFP+n544$Pre.HCG+n544$PRELDH+n544$REDUCr

score5 <- ifelse(n544$REDUCr<0,0,score5)

describe(score5)

# Simple coding: 5 categories for postsize (no difference 20-30 and 30-50 mm)

n544$POST5 <- ifelse(n544$SQPOST<=sqrt(10),0,

ifelse(n544$SQPOST<=sqrt(20),1,

ifelse(n544$SQPOST<=sqrt(30),2,

ifelse(n544$SQPOST<=sqrt(50),3,4))))

POST5 <- n544$POST5[n544$REDUCr>-1]

y <- n544$NEC[n544$REDUCr>-1]

x <- ftable(as.data.frame(cbind(score5,POST5,y)))

ftable(x, col.vars = c(1,3))

# Predictions from simplified lrm model

full.simple2 <- lrm(y~as.factor(POST5)+score5,x=T,y=T,se.fit=T)

# Calculate predicted probabilities + 95% CI

x <- cbind(full.simple2$x,

round(plogis(full.simple2$linear.predictors),2),

round(plogis(full.simple2$linear.predictors-1.96*full.simple2$se.fit),2),

round(plogis(full.simple2$linear.predictors+1.96*full.simple2$se.fit),2))

unique(x[order(full.simple2$x[,1],full.simple2$x[,2],full.simple2$x[,3],full.simple2$x[,4]),])

## C stat for this Table; which makes 5 groups (4 cut-offs in ROC curve)

n544$simple.cat <- ifelse(n544$POST5==3 & score5==4,70,

ifelse(n544$POST5==3 & score5==5,80,

ifelse(n544$POST5==2 & score5==3,60,

ifelse(n544$POST5==2 & score5==4,80,

ifelse(n544$POST5==2 & score5==5,90,

ifelse(n544$POST5<2 & score5==2,60,

ifelse(n544$POST5<2 & score5==3,70,

ifelse(n544$POST5<2 & score5==4,80,

ifelse(n544$POST5<2 & score5==5,90, 50)))))))))

describe(n544$simple.cat)

rcorr.cens(n544$simple.cat, n544$NEC)

#########################################

Graph for impact of mass size (Radiology 2000)

## graph pre - post with lines for score 1 - 4

score4 <- n544$TER+n544$PREAFP+n544$PREHCG+n544$PRELDH

lrm(n544$NEC ~ as.vector(score4) + as.vector(n544$REDUC10))

full.simple3 <- lrm.fit(y=n544$NEC,x=cbind(score4,n544$SQPOST,n544$REDUC10))

full.simple3

# reduc = (pre-post) / pre; reduc = 1 - post/pre; reduc-1 = -post/pre; pre = -post/(reduc-1)

n544$presize <- - n544$SQPOST^2 / ifelse((n544$REDUC10/10 - 1) != 0,(n544$REDUC10/10 - 1),-.04)

describe(n544$presize)

# Now calculate iso probability lines: e.g. prob -70%,

# depending on postsize (horizontal), and presize (vertical)

cbind(n544$presize, n544$SQPOST^2, plogis(full$linear.predictor))

# calculate PRESIZE from SQPOST, for given probabilities

presize = f(sqpost, plogis(full))

pre = -post/(reduc10-10) # for reduc10<10

# range sqpost from 2 to 50; score 4 0 - 4

full.simple3$coef

x <- as.matrix(cbind(rep(0:4,49), rep(sqrt(2:50),5)))

lp.simple3 <- x %*% full.simple3$coef[2:3] + full.simple3$coef[1]

lp.simple3

# now calculate reduc10 with condition p=70% etc

# Solve equation qlogis(.7) = lp.simple3 + full.simple3$coef[4] * n544$REDUC10

reduc10.70 <- (qlogis(.7) - lp.simple3) /full.simple3$coef[4]

# Calculate presizes: pre = -post/(reduc10/10-1)

presize.70 <- - rep(2:50,5) / (reduc10.70/10 - 1)

# in 1 formula for efficiency

presize.90 <- - rep(2:50,5) / (((qlogis(.9) - lp.simple3) /full.simple3$coef[4])/10 - 1)

presize.80 <- - rep(2:50,5) / (((qlogis(.8) - lp.simple3) /full.simple3$coef[4])/10 - 1)

presize.60 <- - rep(2:50,5) / (((qlogis(.6) - lp.simple3) /full.simple3$coef[4])/10 - 1)

presize.50 <- - rep(2:50,5) / (((qlogis(.5) - lp.simple3) /full.simple3$coef[4])/10 - 1)

x <- as.matrix(cbind(rep(0:4,49), rep(sqrt(2:50),5),rep(2:50,5),lp.simple3,reduc10.70,

presize.70,presize.90,presize.80,presize.60,presize.50 ))

colnames(x) <- Cs(score4, sqpost,postsize,lp,reduc1070,

presize70,presize90,presize80,presize60,presize50)

x <- as.data.frame(x)

x[1:10,]

x[x<0] <- NA

# Try to make some plots; isoprobability lines for different scores, x-axis=postsize

par(mfrow=c(2,2))

for (i in 1:4) {

plot(y=x[x$score4==i,"postsize"],x=x[x$score4==i, "presize50"], main=paste("Score=",i,sep=""),

ylab='Postchemo size (mm)', xlab="Prechemo size (mm)",

xlim=c(0,100), ylim=c(0,50),type="p", pch="50%", axes=F,las=1)

axis(side=2,at=c(0,10,20,30,50))

axis(side=1,at=c(0,20,50,100))

points(y=x[x$score4==i,"postsize"],x=x[x$score4==i, "presize60"], pch="60%")

points(y=x[x$score4==i,"postsize"],x=x[x$score4==i, "presize70"], pch="70%")

points(y=x[x$score4==i,"postsize"],x=x[x$score4==i, "presize80"], pch="80%")

points(y=x[x$score4==i,"postsize"],x=x[x$score4==i, "presize90"], pch="90%")

} # end loop over score 1 - 4

#########################################

Meta-model with tree presentation (Int J Cancer 1999)

## Start with dichotomizing predictions from full as <70% vs >=70%

n544$predfull <- plogis(full$linear.predictor)

n544$predhigh <- ifelse(plogis(full$linear.predictor)<.7,0,1)

rcorr.cens(n544$predhigh, n544$NEC)

table(n544$predhigh, n544$NEC)

# So: low prediction, NEC 129:273; high prediction, NEC 116:26

273 /(273+26) # sensitivity for residual tumor/teratoma

116 /(116+129) # specificity for necrosis

# Make tree model for this outcome

library(rpart)

options(digits=3)

tree.orig <- rpart(NEC ~ Teratoma+Pre.AFP+Pre.HCG+LDHst+Post.size+Reduction,

data=n544)

plot(tree.orig)

text(tree.orig, use.n=F)

tree.meta <- rpart(predhigh ~ Teratoma+Pre.AFP+Pre.HCG+LDHst+Post.size+Reduction, data=n544)

plot(tree.meta)

text(tree.meta, use.n=F)

## Make smooth tree presentation; classification with reduction, teratoma, AFP

score3 <- as.numeric(n544$Reduction>70)+as.numeric(n544$Teratoma==1)+as.numeric(n544$Pre.AFP==1)

n544$tree.cat <- ifelse(n544$Reduction>50 & score3>=2,1,0)

describe(n544$tree.cat)

rcorr.cens(n544$tree.cat, n544$NEC)

table(n544$tree.cat, n544$NEC)

# So: low prediction, NEC 134:264; high prediction, NEC 111:35

264 /(264+35) # sensitivity for residual tumor/teratoma

111 /(111+134) # specificity for necrosis

## End illustrations of presentation formats