Joint modelling versus composite endpoints

19 May 2018

Papers: Multitype events and the analysis of heart failure readmissions
              Frailty modelling for multitype recurrent events in clinical trials
Programs: PaulBrownPhD/mtre


Introduction

In other blog posts I have described composite endpoints. A statistician may feel uneasy about composites that whittle down variables into a single value that quantifies the totality of treatment benefit. A loss of power is likely and tied to arbitrary decisions embedded within the definition of the composite. There seems to be a necessary trade-off in this regard with an increase in statistical efficiency coinciding with a decrease in clinical relevance, and vice versa (roughly speaking). In other words, at one end of a spectrum, opposing clinical composites, is multivariate modelling, the more natural choice as far as the statistician is concerned.

That composites are leaking statistical power may not be readily discerned because power estimates based on composites can be crude and unreliable; data simulations are required, especially when the number of component outcomes is large (>3 say) [ref]. Guesstimates for the correlations among outcomes will be assumed for the simulation and are obtainable from previous study data or registry data. Power estimates should also consider a range of plausible effect sizes, however, it can be difficult to anticipate discordant effects (qualitative heterogeneity), and the more outcomes included in the composite, the greater the risk of discordant effects and a loss of power[ref] or ambivalent results. In any case, statistical power should not be the driving factor when selecting a composite[ref]; the ultimate justification has to be a clinical one. Thus, it feels disingenuous to claim a composite has been employed to enhance power (a claim often made), especially when the alternatives may offer superior power and are paid little heed.


Neglected alternatives

One might think our impulse would be to use multivariate methods on multivariate data but instead we find ourselves discoursing on the best way to compress multivariate data into a nonparametric univariate analysis. However, alongside the expanding literature that simultaneously promotes and condemns composites, we are beginning to see some researchers explicitly rejecting the use of a composite in favour of modelling in their clinical trials[ref1, ref2]. And, at this moment, when composites have become the default thinking, we are encouraged to consider whether joint modelling remedies those issues tied to composites or whether it merely introduces problems of its own. After all, when we choose a composite for the primary analysis we are implicitly dismissing the alternative.

The characteristic feature of this alternative approach is the simultaneous modelling of correlated outcomes that collectively measure disease progression (assuming for the moment that outcomes are of the same type). There is no intermingling or prepping of outcomes and a subsequent loss of information as with composites. Thus, separate estimates of the treatment effect, and an assessment of heterogeneity, are a consequence of the model (reporting these statistics has been widely recommended as essential for the interpretation of composites[ref1, ref2, ref3, ref4], although they may often be absent[ref]). The model could assume a common effect across outcomes if this was deemed plausible. Otherwise an estimate of the overall effect could be calculated as a contrast of the individual estimates and thus incorporate weights. Unlike composites, the weighting is not inherent i.e. a consequence of the algorithm for deriving the composite, it is instead applied after the model has been fitted and is therefore made explicit. This is important given the subjectivity of weighting outcomes, e,g, patients and clinicians may prioritise outcomes differently[ref].

Since both approaches yield an estimate of overall benefit it is instructive to compare the results. For example, Mascha et al. contrasted a population average (generalised estimating equation (GEE)) model with the any-versus-none composite for multiple binary outcomes: complications classified by organ system for patients undergoing surgery[ref]. An odds ratio was estimated for the composite and a weighted average odds ratio was derived from the GEE model (see example SAS code below). The latter was more extreme, i.e. further from 1, and statistically significant, while the composite odds ratio was not significant (the p-value shifted from 0.169 to 0.023). Advantages of the modelling approach noted by the authors include "use of more information per subject, ability to apply clinical importance weights, and in most cases greater statistical power". Unlike the GEE model, power for the composite was sensitive to baseline frequencies which are difficult to anticipate; hence powering on a GEE model when designing a study may be preferable[ref].

Often our data include time to adverse events, rather than merely binary indicator variables. In this case the GEE model could be replaced by a random effects model with individual patient effects (frailties) that follow an assumed distribution. See the link to our paper above. We analysed mortality and heart failure related readmissions classified as emergency department visits and hospitalisations. Random effects for these event-types were assumed to follow a multivariate Normal distribution and the model was implemented in SAS (see code below; in our second paper linked above we give a alternative software options for this model). Popular composites were included for comparison, namely time-to-first, the unmatched win-ratio, and days-alive-and-out-of-hospital. By bootstrapping study data, it was shown that the random effects model offered considerably more power. The model also allowed for an assessment of the associations among outcomes which is missing from the composite analysis (i.e. between mortality, emergency department visits and re-hospitalisations). Other authors have discussed composites and modelling for time-to-event data, e.g. Wu & Cook (time-to-first and Wei, Lin & Weissfeld marginal model)[ref] and Rogers et al. (win-ratio and joint frailty model)[ref] who both make a case for the more thorough analysis with the treatment effect underestimated by the composite.

Sometimes, in addition to time-to-event data we have a longitudinal outcome (typically a biomarker)[ref]. In this scenario a global rank composite[ref] or a random effects model joint model[ref] could be used. A group from Utrecht made such a comparison and showed using data simulations, once again, that the joint model offers superior statistical power[ref]. Such joint modelling of outcomes has also been shown to reduce bias and lead to more efficient estimates which implies a smaller required sample size according to the strength of the association between biomarker and time-to-event outcomes[ref]. Random effects modelling has been described for more eclectic outcomes[ref] however we may wish to turn our attention to latent variable models in this case. For example, Teixeira-Pinto & Mauri used a latent variable model to analyse outcomes after coronary stenting[ref] and highlighted the advantages over a composite with attention given to missing data[ref]. Although the modelling approach cannot provide a meaningful estimate of overall benefit across disparate outcomes and a strong case could be made for composites under these circumstances. Gardiner has described SAS code[ref].


Final remarks

It is easy to find fault with composites when they appear cobbled together and ad hoc; the criticisms are well-known. Yet composites remain a favoured approach. Advances in software and methodology enjoin statisticians to adopt new and better methods and acknowledge that the demand for simplicity may not be extraneous (e.g. dictated by regulatory authorities or clients or the wider medical community) but self-imposed.


Notable sources:

Cook, R. and Lawless, J. (2007). The Statistical Analysis of Recurrent Events. http://www.springer.com/gp/book/9780387698090#