A Brief Follow up on Bias in Prognosis

Hot on the heels of our posting earlier this week on The Diagnostic Imperative comes a commentary by the NICHD Neonatal Research Network’s Matthew Rysavy and Jon Tyson, expanding on methodological considerations underlying their earlier article on inter-institutional variation in survival in extremely preterm infants. Since treatment limitation for prognosis was a proposed potential use of genomic sequencing, since the issues overlap substantially, and since perceived prognosis can drive both overtreatment and undertreatment, it’s worth talking about:

Rysavy MA, Tyson JE. The Problem and Promise of Prognosis Research. JAMA Pediatrics. 2016. March 14. Pub ahead of print. 

Rysavy and Tyson argue that both the methods and interpretation of prognostic testing have not kept pace with advances in therapeutic studies, and that further attention is needed by clinical epidemiologists to develop the field and by those undertaking and reading studies. The list of potential problems, using the illustrative case of borderline viability, includes the following. I list their contributions supplemented by some from Dr. Susanne Hay, a fellow in the Harvard Neonatal Perinatal Medicine Fellowship who recently presented an excellent summary of bias in this field.

  1. Sampling bias, resulting in compromised external validity. The results in the original Rysavy study were from the participating NICUs in the NRN, where they found substantial differences in outcome between institutions. Some of this was explained by decision not to resuscitate, but not all. There are differences in severity of illness, related to prenatal care, ethnicity etc. All assessments of future risk, whether in gestational age or genetic sequencing, must take into account the baseline risk of the population. Indeed, for genetic testing, the positive predictive value depends directly on this baseline.
  2. Attrition/survivorship bias. This is the core of the argument around inter-institutional variability in survival at borderline gestational age, in which patients who are not resuscitated appear as having had unsuccessful intervention (that is, they are in the denominator but not in the numerator of the mortality ratio)
  3. Ascertainment bias. Prior to the decision to resuscitate, babies must be appropriately classified as liveborn or stillborn. Only those ascertained as liveborn (and also subsequently resuscitated) are monitored for survival. The others appear neither in the numerator nor in the denominator.
  4. Misclassification of exposure. Even the best estimates of gestational age are known to have a margin of error. This is compounded by the heterogeneity in risk between earlier and later points within the same week that are then rounded up, as well as contributing factors such as growth restriction and pregnancy comorbidities. In genomic testing, misclassification occurs through uncertainty regarding whether variants are pathological, incidental findings, and false positive results.
  5. Misclassification of outcome. Even in the assessment of gestational age risk, there is substantial personal preference regarding which outcomes – survival, development or functional health status/utility – are appropriate metrics. That potential for misclassification is even higher in genetic testing, where each incidental variant is associated with a specific constellation of findings that might vary between groups.
  6. Chance. For both borderline gestational age and genetics variants, the available sample for study may be small. Reports of prognosis (such as on the NICHD Extremely Preterm Birth Calculator) often do not include appropriate measures of statistical uncertainty such as confidence intervals, and these may be require special techniques in the context of genome-wide studies.

There are undoubtedly other such methodological considerations. Rysavyy’s and Tyson’s call for a better approach to study, through protocol registration and research guidelines, is a point well taken.