How to plot survival curves with Plotly and Altair
All three of the major Python survival analysis packages—convoys, lifelines, and scikit-survival—show how to plot survival curves with Matplotlib. In some cases, they bake Matplotlib-based plots directly into trained survival model objects, to enable convenient one-liner plot functions.
The downside of this convenience is that the code is hidden, so it's harder to customize or use a different library. In this article, I'll show how to plot survival curves from scratch with both Altair and Plotly. All of the following code lives in a live, executable Jupyter notebook on Binder and the source file can be found in the Crosstab Kite gists repo.
Survival curves describe the probability that a subject of study will “survive” past a given duration of time. This article assumes you're familiar with the concept; if this is the first you've heard of it, lifelines and scikit-survival both have excellent explanations.
Lifelines introduces survival curve estimation with an example about the
tenure of political leaders, using a duration table dataset that's
included in the package. Each row represents a head of state; the columns of interest
duration, which is the length of each leader's tenure,
observed, which indicates whether the end of each
leader's time in office was observed (it would not be observed if that leader died in
office or was still in power when the dataset was collected).
1 2 3 4
ctryname ehead duration observed 1022 Mauritania Mustapha Ould Salek 1 1 1565 Switzerland Ruth Dreifuss 1 1 763 Ireland Eamon de Valera 2 1 1722 United States of America Bill Clinton 8 1 1416 Somalia Abdirizak Hussain 3 1
A fitted lifelines Kaplan-Meier model has a method
plot_survival_function that uses Matplotlib. It's certainly
convenient but by hiding all the logic, it's harder to see how to customize the plot or
implement it in a different library.
1 2 3 4 5 6
How to plot a survival curve with Altair
Here's how to generate the same plot from scratch with Altair. There are three things we need to do:
- Process the lifelines model output.
- Plot the survival curve, as a step function.
- Plot the 95% confidence band, as the area between the lower and upper bound step functions.
The fitted lifelines Kaplan-Meier model has two Pandas DataFrames:
confidence_interval_. We need to combine these into a single
DataFrame to make the Altair plot. We also need to convert the index into a column so we
can reference it as the X-axis.
1 2 3 4 5
timeline KM_estimate lower_bound upper_bound 0 0.0 1.000000 1.000000 1.000000 1 1.0 0.721792 0.700522 0.741841 2 2.0 0.601973 0.578805 0.624308 3 3.0 0.510929 0.487205 0.534126 4 4.0 0.418835 0.395233 0.442242
Now we can construct Altair plot objects: first the survival curve as a line mark, then
the confidence band as an area mark on top of the line. The only trick is that we use
interpolate='step-after' in both the line and area marks to
create the correct step function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
How to plot a survival curve with Plotly
It's slightly trickier to draw the same plot with Plotly because Plotly's confidence
band solution is a bit funky. First, we set up the figure and add the survival curve as
a line plot. We specify
shape='hv' in the
line parameters to get the correct step function.
Note that we don't need to create a plot DataFrame here because we're going to draw each Series as a standalone Plotly trace.
1 2 3 4 5 6 7 8 9
Next, we add traces for the upper and lower bounds of the confidence band, separately and in that order. The lower bound fills up to the next trace, which seems to be the previous trace defined in the code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Finally, we add axis titles and styling, then show the plot. I've omitted some styling here for brevity.
1 2 3 4 5 6
Notes & references
- Listing image by Abdul A on Unsplash.