Architecture diagram. Three passive verification channels as a floor (ambient displays with Visual Patient as existence proof; behavioral inference with Joachims and Baker-Tenenbaum BToM; action-precondition gates with anticoagulation and allergy gates), each labeled with its mechanism and its documented failure mode. Below the floor, a dashed band labeled residual gap: silent inaction divergence, no order, no query, no gate trigger. Below the gap, a smaller centered box labeled active read-back: the closure mechanism, narrow, AI-targeted, at workflow seam, WHO Surgical Safety Checklist as model. Footer reads: passive covers volume; active read-back covers the residual.
May 19, 2026 · 11 min read

Quiet Verification

Compressed Medicine · Companion to Part 10 · Passive by default, active for closure

By Sunny Harris, MD

2:47 a.m. The intern is on hour ten of a twelve-hour shift, signing orders on the seventh patient of the night. A modal fires: penicillin allergy, are you sure about the ceftriaxone, Yes/No. Click Yes. This is the fourth time tonight that modal has fired. Three of those four times, the cross-reactivity risk was negligible. Once it was not, but the modal was dismissed before the sentence finished parsing in conscious thought. The signal was registered, not received.

A typographic scene at 02:47, hour ten of twelve. A faked modal dialog asks 'Patient allergic to penicillin. Continue ordering ceftriaxone?' with No and Yes buttons; the Yes button is highlighted in warm amber and labeled 'click 187 ms'. A horizontal time axis below marks 200 ms (click registered), 400 ms (word parsed), 600 ms (meaning resolved), 900 ms (action considered) — the click sits before the parsing tick. Caption: the click happens before the parsing; the modal was registered, not received.

This is what a verification mechanism looks like after it has lost its signal. The modal still appears. The acknowledgment is still logged. The audit trail still shows compliance. The information the modal was supposed to deliver does not reach the receiver, because by hour ten of a shift the receiver is no longer parsing modals as sentences. The clinician is parsing them as click targets, and the click is faster than the cognition.

The previous essays in this series have laid out a verification architecture for clinical AI. Verification is one of two operations on that architecture, paired with acquisition (treated in a companion essay); this essay addresses the friction-engineering problem for the verification half. Inference under partial observability requires that the model's representation of the patient (a mixture distribution over hypothesis-components, each with its own evidence dependencies, predicted trajectory, and action threshold) stay coupled to reality through observable signals on multiple channels. Verification fails when the signal is not registered, when the receiver fills in from the wrong prior, when shared vocabulary masks divergent state. The structural mitigations exist: hypothesis matrices, structured read-back, calibrated probability language, provenance per assertion. The problem the present essay addresses is what these mitigations cost the clinician at 3 a.m. on hour ten, and what design moves keep them working when the cost is the dominant variable.

The evidence on this is thirty years old and unambiguous. Van der Sijs et al. in 2006 measured override rates of 49 to 96 percent on drug-safety alerts in CPOE; the high-end studies showed the alerts were dismissed reflexively in under a second, faster than the alert text could be read. Strom et al. in 2010 ran a randomized trial of a near-hard-stop alert for warfarin combined with trimethoprim-sulfamethoxazole and stopped the trial early because the hard stop produced treatment delays of up to three days; clinicians worked around the gate by paging colleagues, writing for a different drug they did not want, or deferring the indication. Wong et al. in 2021 externally validated the Epic Sepsis Model and found it missed two thirds of sepsis cases while raising the per-clinician alert burden by an order of magnitude.

The deeper finding underneath these results is the one that should organize a verification architecture. Repeated low-yield modal alerts do not merely annoy. They train a pre-cognitive suppression reflex. The clinician learns to dismiss before parsing, and once that reflex is installed it cannot be locally disabled. The rare true positive is dismissed by the same reflex that dismisses the false positives. The verification mechanism not only fails to deliver the signal it was designed for; it destroys the signal it was meant to deliver, by training the receiver to suppress it.

The corollary for clinical AI is that almost no verification can run on the active channel. The active channel is the clinician's foveal attention, claimed by an interruption that requires explicit acknowledgment. Anything that costs the clinician visible time at 3 a.m. will be bypassed; anything bypassed often enough will be bypassed reflexively even when the bypass is wrong. The verification architecture has to put almost all of its work on channels the clinician does not have to spend attention on.

The floor

Three channels can carry verification without claiming foveal attention.

The first is ambient display. Mark Weiser and John Seely Brown, working at Xerox PARC in 1996, named the design principle: information moves between the periphery and the center of attention only when the stakes change. The periphery is what we are attuned to without attending to explicitly, the way an experienced driver is attuned to the engine note. Pre-attentive visual channels (color, motion, position, orientation) are processed in under 250 milliseconds with no measurable cognitive load. Verification signals can live peripherally by default and recruit central attention only on divergence.

The cleanest existence proof for this in clinical work is the Visual Patient avatar, developed by David Tscholl and colleagues at Zurich. Vital-sign streams are encoded as a body-shaped avatar whose color, shape, and motion change pre-attentively. Cyanosis appears as blue. Low blood pressure deflates the outline. Eye-tracking studies show faster and more accurate detection of critical events compared with numerical displays, with no added clicks. Two thirds of preventable anesthesia complications trace to situational-awareness failures; avatar displays attack exactly that surface. The translation to AI verification is direct. A calibrated probability badge that intensifies as the AI's confidence in a leading hypothesis drops. A provenance glyph that only surfaces when source disagreement appears. A peripheral indicator on the differential that brightens when cross-evidence diverges.

The second channel is behavioral inference. The clinician's stream of actions, queries, and non-actions is itself a continuous read-back of their own mixture over hypothesis-components. Implicit-feedback research in information retrieval (Joachims et al., 2005; Kelly and Teevan, 2003) established that clickthrough, dwell, and abandon signals can be decoded as preference data without explicit confirmation. Bayesian theory of mind (Baker, Saxe, Tenenbaum, 2009) formalized inverse planning: assume the agent is approximately rational, invert their policy, recover the beliefs and utilities that best explain the action trajectory. The clinical translation is direct. A clinician ordering a CT pulmonary angiogram when the AI's mixture placed PE at low weight is signaling a posterior shift that no one stated out loud. Not ordering troponin when the AI's leading component is acute coronary syndrome is another. Consulting cardiology on a chart the AI summarized as primarily musculoskeletal is a third. Each is a divergence between two mixtures, registered without asking the clinician for any extra step.

The third channel is action-precondition checks. Production clinical decision support has, for years, half-implemented these for a specific class of high-stakes actions: anticoagulation before a procedure, pregnancy before a teratogen, allergy before order signing, code status before escalation. The gate fires when a specific action would commit, and the system compares the action to its internal state. The clinician does no extra work unless the gate fires; when it does, the friction is reserved for slots where being wrong is irreversible and the cost asymmetry justifies the interrupt. This is the only modal interrupt that has consistently survived contact with the shift, and it survived because it fires rarely, fires on a specific predicate, and is bound to an action the clinician is already taking. It is also the form the surgical safety checklist takes, scaled down to a single slot.

These three channels are the floor. Ambient displays cover the volume of routine verification by recruiting only peripheral attention. Behavioral inference reads the clinician's behavior as their own implicit read-back without asking them to externalize anything. Action-precondition gates reserve modal interrupt for the slots where the cost asymmetry is undisputed.

What the floor cannot cover

Each channel has a documented failure mode, and the three failure modes do not cancel.

Ambient displays fail to perceptual habituation. Pre-attentive cues that are always present become invisible within weeks; this is settled change-blindness research. A calibrated probability badge that hovers next to every diagnosis on the differential will, after a month of shifts, be processed as wallpaper. The signal sits in the visual field but not in the attentional set. When the badge drifts from 0.22 to 0.71 over a single encounter, the clinician's eyes have looked at it forty times without resolving it. The audit trail will show the system displayed the signal. The clinician will not have seen it. This is the most pernicious class of verification failure because the system can defend itself: it did display the information.

Behavioral inference fails to a more dangerous asymmetry. The mechanism reads divergence from actions taken, but the canonical clinical miss is the diagnosis no one considered. Aortic dissection presenting as back pain: ibuprofen ordered, follow-up scheduled, the chart signed. No CT, no consult, no order pattern inconsistent with the AI's musculoskeletal working diagnosis. The behavioral inference registers agreement, because agreement is exactly what the action stream looks like. The dangerous miss has no behavioral trace, because the nature of the miss is that no one acted on the possibility. Absence is unobservable to a mechanism that reads action.

Action-precondition gates fail to the slots that warrant friction not being enumerable in advance. The gate fires when a specific high-stakes action is attempted. The discharges that turn out to be wrong are often the ones where no action was attempted: the postpartum patient sent home with reassurance, no opioid ordered, no procedure pending, no gateable action. The gate map covers the predicates the designers anticipated. The miss happens on the predicates no one anticipated, which is most of them.

Together these three failure modes share a structural feature. The dangerous case is the one in which the clinician does not see, does not act, and does not trigger anything. The signal that should have fired has no behavioral footprint, no peripheral indicator the clinician would attend to, and no action to gate. The three passive channels cannot, by their own design, surface a divergence whose entire signature is inaction.

The closure

This is what active read-back is for: the closure mechanism for the inaction-blind spot that no passive method can cover, not a fallback for high-stakes slots in general.

The model for what this looks like in clinical practice is the World Health Organization Surgical Safety Checklist, introduced through the trial by Haynes, Gawande, and colleagues at NEJM in 2009. Major complications fell from 11 percent to 7 percent across eight global sites; inpatient deaths fell from 1.5 percent to 0.8 percent. The mechanism is structured read-back. The team stops at three pre-defined points (sign-in before induction, time-out before incision, sign-out before leaving the OR), the surgeon names the patient and procedure, the nurse confirms the count, the anesthetist confirms the airway. The checklist takes roughly sixty seconds per phase. There is no override button. There is no modal. There is no individual user being interrupted.

The features that distinguish the surgical safety checklist from the modal alerts that fail are visible once you list them. The verification is bolted to existing workflow boundaries; the team has to stop at those points anyway. The work is distributed across the team, so the cognitive load on any single person is small and the social accountability is high. The number of items is small enough to fit in working memory. The read-back is a ritual the team performs, not a thing the system imposes on a user.

The clinical-AI analog is narrow, targeted active read-back at workflow seams: pre-disposition, pre-procedure, pre-signature. The AI selects one or two slots from the behavioral-inference divergence trace and surfaces them to the clinician at a natural pause. Not as a modal. As a structured prompt the clinician addresses in the same beat as the disposition decision they are already making. The set is small because the friction budget is small. The targeting is AI-driven because the slots that warrant friction are not enumerable in advance; only the divergence trace knows which ones matter for this patient on this shift.

The architecture

Passive verification is the floor. It covers the volume the system would otherwise have to interrupt over. Active read-back is the closure. It covers exactly the slot that admits no passive instrument: the inaction divergence that has no behavioral signature, no peripheral indicator, no triggering action. The friction budget is reserved for that closure and nothing else.

What this rules out:

The modal pop-up alert as the primary verification mechanism. Thirty years of evidence shows it fails, and the failure mode is not annoyance; it is the trained suppression of the receiver. Per-claim read-back on every AI suggestion at the point of care. There is no precedent in any field for sustained per-item active verification of high-volume, low-specificity outputs, and the closed-loop traditions in anesthesia and aviation cover episodic high-stakes communications, not continuous dialogue. Hard stops on AI-generated content. Strom 2010 showed where that leads: clinically dangerous workarounds, treatment delays, the same alert-fatigue problem in a new costume.

What this requires:

Ambient calibrated probability and provenance, surfaced inline in the display the clinician is already reading, never as a new dialog. Behavioral inference running continuously underneath, treating the clinician's action stream as their implicit read-back, surfacing internal model updates without external notification. Action-precondition gates narrowed to the small set of irreversible actions where the cost asymmetry is undisputed. And one or two AI-targeted active read-back items per encounter, presented at a workflow seam the clinician is already pausing at, framed as a structured prompt the team can address in seconds.

At the bedside

The intern at 2:47 a.m. is not going to read more sentences in modals. The shift will not get shorter. The volume of clinical AI output is going to keep climbing. The verification architecture that survives this is the one whose verification mechanisms the clinician would not actively rebel against at the end of a twelve-hour shift, because almost all of the verification has happened without the clinician noticing.

The friction budget is a real resource. Spend it where it is the only instrument available. The rest of the verification has to be quiet.

The same three channels and the same discipline carry the other half of the closed loop. When the inference is under-determined and the signal needed to refine it has not arrived, the acquisition layer detects the gap and surfaces it under the same friction rules. That is the symmetric companion to this essay.


Compressed Medicine · 1. The Compression Substrate · 2. The Function of the Message · 3. The Highest Accurate Abstraction · 4. The Decompression Order · 5. The State-Change Filter · 6. The Grounding Constraint · 7. The Belief-State Object · 8. The Divergence Failure · 9. The Defense Architecture · 10. The Temporal Loop · 11. The Irreversible-Action Check

Companions to Part 10 · Quiet Verification · Quiet Acquisition