A Brief Follow up on Bias in Prognosis

Hot on the heels of our posting earlier this week on The Diagnostic Imperative comes a commentary by the NICHD Neonatal Research Network’s Matthew Rysavy and Jon Tyson, expanding on methodological considerations underlying their earlier article on inter-institutional variation in survival in extremely preterm infants. Since treatment limitation for prognosis was a proposed potential use of genomic sequencing, since the issues overlap substantially, and since perceived prognosis can drive both overtreatment and undertreatment, it’s worth talking about:

Rysavy MA, Tyson JE. The Problem and Promise of Prognosis Research. JAMA Pediatrics. 2016. March 14. Pub ahead of print. 

Rysavy and Tyson argue that both the methods and interpretation of prognostic testing have not kept pace with advances in therapeutic studies, and that further attention is needed by clinical epidemiologists to develop the field and by those undertaking and reading studies. The list of potential problems, using the illustrative case of borderline viability, includes the following. I list their contributions supplemented by some from Dr. Susanne Hay, a fellow in the Harvard Neonatal Perinatal Medicine Fellowship who recently presented an excellent summary of bias in this field.

  1. Sampling bias, resulting in compromised external validity. The results in the original Rysavy study were from the participating NICUs in the NRN, where they found substantial differences in outcome between institutions. Some of this was explained by decision not to resuscitate, but not all. There are differences in severity of illness, related to prenatal care, ethnicity etc. All assessments of future risk, whether in gestational age or genetic sequencing, must take into account the baseline risk of the population. Indeed, for genetic testing, the positive predictive value depends directly on this baseline.
  2. Attrition/survivorship bias. This is the core of the argument around inter-institutional variability in survival at borderline gestational age, in which patients who are not resuscitated appear as having had unsuccessful intervention (that is, they are in the denominator but not in the numerator of the mortality ratio)
  3. Ascertainment bias. Prior to the decision to resuscitate, babies must be appropriately classified as liveborn or stillborn. Only those ascertained as liveborn (and also subsequently resuscitated) are monitored for survival. The others appear neither in the numerator nor in the denominator.
  4. Misclassification of exposure. Even the best estimates of gestational age are known to have a margin of error. This is compounded by the heterogeneity in risk between earlier and later points within the same week that are then rounded up, as well as contributing factors such as growth restriction and pregnancy comorbidities. In genomic testing, misclassification occurs through uncertainty regarding whether variants are pathological, incidental findings, and false positive results.
  5. Misclassification of outcome. Even in the assessment of gestational age risk, there is substantial personal preference regarding which outcomes – survival, development or functional health status/utility – are appropriate metrics. That potential for misclassification is even higher in genetic testing, where each incidental variant is associated with a specific constellation of findings that might vary between groups.
  6. Chance. For both borderline gestational age and genetics variants, the available sample for study may be small. Reports of prognosis (such as on the NICHD Extremely Preterm Birth Calculator) often do not include appropriate measures of statistical uncertainty such as confidence intervals, and these may be require special techniques in the context of genome-wide studies.

There are undoubtedly other such methodological considerations. Rysavyy’s and Tyson’s call for a better approach to study, through protocol registration and research guidelines, is a point well taken.

 

The Diagnostic Imperative and Next Generation Sequencing

Two seismic events in the past three years have caused the ground to shift in neonatal genetics. After a lifetime of our thinking about pattern recognition, family histories and highly focused testing with poor sensitivity, next-generation techniques of whole exome and genome sequencing are suddenly not only technically feasible away from the research lab, but also seemingly quite inexpensive. The potential for enormous diagnostic and prognostic impact in the NICU is substantial, and some of the futurists among us are even whispering about universal genomic screening of newborns to supplement or replace current newborn screen approaches. In that context, the newest of a handful of commentaries dealing with the bigger picture is welcome: Wilkinson D, Barnett C, Savulescu J, Newson A. Genomic intensive care: should we perform genome testing in critically ill newborns? Arch Dis Child Fetal Neonatal Ed March 2016 Vol 101 No 2

The authors present a balanced, case-based discussion of the potential consequences of genomic testing. They outline its potential uses: treatment modification to improve biological outcome, treatment limitation to improve end-of-life transition for fatal conditions, and anticipation or information in cases in between. Essentially, the argument is that widespread genomic screening and testing is imminent (from direct-to-consumer sources if not the formal medical establishment), and that we need to decide the circumstances in which to implement.

The paper argues (appropriately) that there should not be a presumption against testing, but I’d argue that we actually have the opposite. We’ve not done a very good job in neonatology of holding back on testing. We operate under a diagnostic imperative: when we can test, we do, because information seems like an undeniable good. Consider cranial ultrasound screening for prognosis, which is undertaken almost universally and repeatedly in preterm infants without evidence that it changes outcomes, and without particularly good predictive validity. There are certainly babies in whom the risk is elevated and the test characteristics better, but a screening, rather than diagnostic, approach has had a firm hold for twenty years.

Why might we want to resist the diagnostic imperative? First, tests that have poor test characteristics tend to lead to a diagnostic cascade, with the potential for eventual adverse biological consequences of either the tests themselves or of the resulting interventions. Next generation sequencing, with its enormous amount of uncertain but potentially actionable information in a single report, is as stark an example as we have seen of this phenomenon. Second, the costs of testing are often far greater than the cost of the test itself. In part this is a consequence of that same diagnostic cascade, but it is also from the a second, therapeutic cascade. In the case of genomic testing, for example, the intensive personnel involvement for interpretation, consultation and counseling is likely to swamp the famously $500 genome. Finally, and most importantly, prognostic (or even diagnostic) information may simply not be wanted by parents. We’ve only started to ask families about this regarding neuroimaging; for the multiple prognostic findings in a screened genome, the preference issues multiply exponentially.

Although the article frames the discussion as ‘ethics,’ most of the arguments are thus actually not in a moral sphere but rather a practical, societal one (and therefore of additional interest to us in here). The management of false positives, incidental findings, costs, uncertainty, and actionability are all technical to some degree, in that they can be measured, judged against other tests, and optimized before introduction. So why not defer until we’ve measured and considered, just a bit? This is exactly what is underway in the NSIGHT Consortium, a multi-site initiative of the National Human Genome Research Institute, which will study both NICU babies and healthy term infants. The investigators will “explore, in a limited but deliberate manner, the implications, challenges and opportunities associated with the possible use of genomic sequence information in the newborn period.” Critically, the program includes analysis of the ethical, legal and social implications. Dr. Dmitry Dukhovny, a health services researcher and neonatologist at OHSU, will co-lead an examination of economic outcomes.

To their credit, the authors call for “revision of models of informed consent” prior to introduction of NGS. We should take care that we have accurate, empirical information to present during that process, before we launch into it.

 

 

 

 

Newborn Medicine and the “Third D”

So why another blog about newborns?

In an interview a few years ago, Paul Farmer, the co-founder of Partners in Health and arguably one of our most effective communicators about obstacles to social justice and global health at the grandest scale, posited that the largest of those obstacles relates not to what we know about effective technologies, but to our ability to deliver them to patients. The coming revolution, he suggested, would be driven by this “third D” – Delivery – and progressively less by the other Ds, Discovery and Development. Farmer was thinking about extreme disadvantage in predominantly under-resourced settings, but I was fascinated by how clearly he had articulated the central problem in our own, highly privileged and technologically-advanced field.

The Three Ds (and their other formulation, the “clinical and translational research spectrum“) do not just describe a range of investigation; they also trace the dominant activities in our field over time. Over the past several hundred years, progress in medicine has typically started with epiphanies of basic or observational science by those archetypal white-coated (or, before that, be-wigged) scientists who populate old movies. The most prominent examples are the basso continuo of our pre-clinical medical school reading: William Harvey’s description of the circulation; Anton van Leeuwenhoek and the discovery of blood cells; Louis Pasteur and germ theory; Alexander Fleming’s accidental discovery of penicillin;  Edward Jenner’s connecting of cowpox to prevention of smallpox; Jonas Salk and the cataloguing of polio virus, James Watson, Francis Crick and Rosalind Franklin and DNA. The classic neonatology-specific example is the work of John Clements, whose painstaking measurements of surface tension and identification of surfactant, subsequently confirmed to be the cause of respiratory distress syndrome by Mary Ellen Avery, are the foundations for our field.

Such laboratory advances have continued at an accelerated pace, but much of the music of the twentieth century involved understanding how basic mechanisms could be safely and reliably translated into interventions that directly improve health. There had been earlier examples, of course, like William Morton’s “Gentleman, This is No Humbug” use of anesthesia in 1846. But the most powerful advances came along with the evolution of rigorous study design, particularly the randomized controlled trial introduced by Austin Bradford Hill in the 1940s.  Within a few years of Hill’s first RCT, Jonas Salk had employed the approach with an audacious, partially randomized trial of the inactivated polio vaccine in schoolchildren, and evidence-based pediatrics was on its way.

“…even as we rack up successes,
the failures to apply what we know
become more dramatic”

Newborns and infants have benefitted disproportionately from Discovery and Development.  Deaths from diphtheria, mumps, pertussis and tetanus decreased by 99% following vaccine introduction. Tetsuro Fujiwara’s exogenous surfactant, building on the discoveries of Avery and Clements, decreased infant mortality by more than half in only a few years. Some of the earliest randomized trials were completed in newborns, and the Cochrane Neonatal Review Group was founded third of some 53 currently operating worldwide.

So why do we need a Delivery Revolution in neonatology? The problem is that, even as we rack up successes, the failures to apply what we know become more dramatic. Globally, 44% of the 6.3 million annual under-5 deaths are still in the neonatal period, mainly from prematurity, pneumonia and intrapartum complications. Here in the US,  prematurity-related infant mortality rates for African-Americans are 3.9 times those of white babies. When we have efficacious therapies, we don’t necessarily choose to use them: even after more than 3,000 pregnancies were enrolled in RCTs consistently showing antenatal corticosteroids to be safe and effective, fewer than 40% of eligible women were receiving them. Enormous interinstitutional variability has been documented repeatedly, as for example in the Vermont Oxford Network, where 25% of units in 2008 used early CPAP in fewer than 1/3 of their babies, while 25% used it in 3/4 of patients.  Finally, it is no secret that this inconsistency is seen in a US health care system that costs 50% more than the next runner in the pack.

I freely admit to being a trials nerd, and it is crucial that we continue to examine efficacy results from rigorous studies, as is reported in the other excellent blogs that I read religiously (here and here – sign up if you are one of the few who haven’t already), and to keep our eyes open for more of those early stage laboratory insights that fuel them. But those efforts, and the huge research resources that are invested to generate them, will be wasted if we don’t get them to the bedside. Tools and research approaches are available, but less widely discussed and poorly funded in neonatology. We’ll talk about them here: health services and population research, research strategy and policy, implementation science, health economics, global and domestic health policy, and whatever else we can collectively identify as impediments to infant health. I’ll lead with my own observations, but please consider this an invitation to a vigorous conversation…