It's 0200. A ward doctor calls you about a 78-year-old with worsening type 2 respiratory failure. COPD. Third admission this year with the same. He's frail, but independent. The registrar wants to know: should this patient come to ICU?

You make a decision.

Now rewind the tape. Same patient, same referral. But the registrar calls your colleague instead.

Would they make the same call?


The coin-flip problem

In a study that should unsettle every intensivist, 146 consultants were asked to estimate the survival of an elderly patient being considered for ICU admission.1 Their estimates ranged from 0% to 95%. Their decisions to admit or decline were barely better than chance.

Read that again. Not a 10-point spread. Zero to ninety-five.

Even when two intensivists gave similar survival estimates, they often made opposite admission decisions.1 The disagreement wasn't just about prognosis. It was about what the prognosis meant. About what counted as "too sick" or "not sick enough" or "too far gone."

This isn't an outlier finding. A systematic review of ICU triage literature found poor agreement among clinicians regarding both prognosis and the appropriateness of admission as a recurring theme.2 When intensivists evaluated patients with severe COPD exacerbations, they requested different information, weighed it differently, and interpreted the same data to reach opposing conclusions.3

One observational study found that decisions to limit life-sustaining treatment were more strongly associated with the identity of the assessing intensivist than with the patient's presenting complaint or comorbidities.4

Your patient's prognosis depends on who's on tonight.


Noise, not bias

Cognitive scientists have a name for this kind of unwanted variability. They call it noise.

Noise is different from bias. Bias pulls everyone in the same wrong direction: a systematic deviation. Noise is scatter. It's the variability you get when the same case, presented to different clinicians, produces wildly different judgements. Or when the same clinician, on a different day, might decide differently.

Bias gets all the attention. We train for it, we build checklists against it, we publish about it. Noise is harder to see, because no individual decision looks obviously wrong. It only becomes visible when you line up many decisions side by side and notice the spread.

This isn't unique to intensive care. Wherever experienced specialists exercise clinical judgement on complex problems — breast pathology,5 cardiac risk stratification,7 surgical diagnosis6 — studies find the same pattern: strikingly different conclusions from the same information. Intensive care triage is, if anything, a more fertile environment for noise. The decisions are urgent, the information is incomplete, the stakes are existential, and the cases are complex enough that two reasonable clinicians can weigh the same data and reach different answers in good faith.


This isn't about bad doctors

This is the point where I need to be explicit, because the wrong reading of this data is that intensivists are unreliable. That reading is wrong.

The variability isn't a failure of individual competence. It's a property of the system. Complex decisions made under uncertainty, time pressure, and incomplete information will always produce scatter, in any domain, by any expert. Sentencing judges show it. Financial analysts show it. Radiologists show it. It would be remarkable if intensivists didn't.

The question isn't whether noise exists. It's whether we acknowledge it, and what we do about it.


The noise we refuse to see

The traditional response to decision variability is to build algorithms. Risk scores, prediction models, admission criteria. There's evidence this can help: structured tools tend to reduce noise by anchoring decisions to a shared framework.8

But anyone who has worked in an ICU knows why purely algorithmic triage hasn't taken hold.9 These decisions require integration of physiology, trajectory, context, goals of care, resource availability, and clinical gestalt. No score captures all of that. The clinician's judgement isn't the problem to be solved. It's the irreplaceable core of the process.

When people think about designing systems around human performance in healthcare, they tend to think about fatigue management, handover protocols, checklists. Those things matter. But the consistency of human judgement is a design problem too — and it's one we've barely acknowledged. We've built triage around the assumption that clinical decision-making is consistent enough not to need structure. The data says otherwise.

The problem isn't that clinicians disagree. It's that each decision is made alone, at 0200, with no structured way to compare it against what a colleague would have done with the same information. The noise never becomes visible, so it never gets interrogated.

We don't need to replace clinical judgement to take this seriously. We need to design systems that make it visible — to ourselves, to each other, and to the next shift. A system that cannot see its own noise cannot improve.


References

  1. McNarry AF, Goldhill DR. Intensive care admission decisions for a patient with limited survival prospects: a questionnaire and database analysis. Intensive Care Medicine 2004; 30: 325–330.
  2. Gopalan PD, Pershad S. Decision-making in ICU — a systematic review of factors considered important by ICU clinician decision makers with regard to ICU triage decisions. Journal of Critical Care 2019; 50: 99–110.
  3. Kostopoulou O, Wildman M. Sources of variability in uncertain medical decisions in the ICU: a process tracing study. BMJ Quality & Safety 2004; 13(4): 272–280.
  4. Garland A, Connors AF. Physicians' influence over decisions to forego life support. Journal of Palliative Medicine 2007; 10(6): 1298–1305.
  5. Jain RK, Mehta R, Dimitrov R, et al. Atypical ductal hyperplasia: interobserver and intraobserver variability. Modern Pathology 2011; 24(7): 917–923.
  6. Buchweitz O, Wülfing P, Malik E. Interobserver variability in the diagnosis of minimal and mild endometriosis. European Journal of Obstetrics & Gynecology and Reproductive Biology 2005; 122(2): 213–217.
  7. Gershon CA, et al. Inter-rater reliability of the HEART score. Academic Emergency Medicine 2019; 26(5): 552–555.
  8. Ramos JGR, Perondi B, Dias RD, et al. Development of an algorithm to aid triage decisions for intensive care unit admission: a clinical vignette and retrospective cohort study. Critical Care 2016; 20: 1–9.
  9. Charlesworth M, Mort M, Smith AF. An observational study of critical care physicians' assessment and decision-making practices in response to patient referrals. Anaesthesia 2017; 72(1): 80–92.

Scott Santinon is an Intensive Care Fellow and Certified Practitioner in Human Factors in Healthcare, and the founder of Critical Condition.