
A Third Processor
Six Essays on Compression · V · AI is another codec in the chain
Two computers want to talk. The wire between them is perfect, clean signal, no dropped bits, but every message that arrives on the other side is gibberish. Both machines are working flawlessly. The conversation is broken anyway.
The wire works fine. Buy a faster wire and nothing changes. What's missing is the part of the system no wire can carry: the agreement, between the two machines, about what the bytes mean. The bytes are the message. The agreement is what lets the message become anything at all.
This is the shape of every act of communication, at every scale. Two finite processors. A channel between them. An agreement about what crosses the channel. Without the agreement, the channel is just noise. The processors don't have to be computers. They can be two doctors handing off a patient at change of shift. They can be two cells signaling across a synapse. They can be the writer of a book and the reader, separated by a century. The substrate is different; the shape is the same.
What does the agreement contain? It contains everything both sides already know. When the doctor writes "p/w CAP, HD3 ceftriaxone-azithro" on the sign-out, the message itself is tiny: a dozen characters. The agreement is enormous: years of training that taught both ends what those letters point to. The wire carries the letters; the agreement carries the patient.
This is why the same message lands differently in different hands. The bytes are identical. The agreements are not. A reader who shares the agreement gets a patient. A reader who doesn't gets a code. The signal didn't change. The codec on the other side did.
Now insert a new kind of processor into the chain. A medical AI reading the same patient note.
The wire is fine. Text in, text out. The agreement is the problem. The AI's sense of what those letters point to was assembled somewhere else, by people who weren't this clinician, on data that wasn't this patient, for purposes that weren't tonight's decision. The AI's "common" often isn't the clinician's "common." The AI's "important" often reflects what was important in the training data, not what matters in this room.
The AI compresses because finite agents must. The harder question is whether its compression is task-shaped, locally calibrated, and legible to the clinician who has to act on it.
It helps to break the mismatch into layers. The AI can miss because it never saw the data (source-data mismatch), because it pulled the wrong chunks at inference time (retrieval mismatch), because the way it encoded what it pulled doesn't match the way a clinician would (representation mismatch), because what it knew uncertainly came out as prose that read confidently (uncertainty mismatch), because the receiver at the other end has different priors than the model expected (receiver mismatch), or because the workflow it slots into is not the workflow it was designed for (workflow mismatch). The single word hallucination is doing the work of several different failure modes that need to be distinguished if you want to fix any of them.
This is what people are seeing when they say AI gets things wrong in subtle ways. Some AI failures are capability failures. The model lacks the reasoning, the knowledge, or the grounding it would need. But many of the failures that matter clinically are not solved by capability alone. The AI is failing not because it lacks capability, but because its sense of what matters doesn't match the sense of the human on the other end, for the decisions that need to be made. The output looks fluent, the bytes are clean, but the decompression on the receiving side doesn't land where the AI's output assumed it would. The human reads the words. The situation they picture isn't the situation the AI had inferred.
Take one common case. A primary-care clinician sees a patient for a follow-up, does a focused abdominal exam, moves on. Their ambient scribe (Abridge and Nuance's DAX Copilot are the systems most clinics will recognize) generates the note. Reviewing it that evening, the clinician skims a line: "Abdomen: soft, non-tender, no hepatosplenomegaly." They never said "no hepatosplenomegaly" out loud. The scribe inferred it from the shape of a normal abdominal exam, the way it inferred most of the negatives in the review of systems. The clinician signs. Weeks later a consultant pulls the chart and reads that line as a finding documented at the prior visit. Both ends of the inscription hop treated the sentence as data. The patient is the only one who knows the exam never happened. Hallucinated physical-exam findings of exactly this shape have been documented across the major ambient scribe platforms in recent simulated-encounter work (Mayo Clinic Proceedings: Digital Health, 2025).
Hallucination is the sharpest example of this. When a language model fills a gap with what's most plausible under its training, it is making an analogous move to a human under uncertainty: extrapolating from prior patterns when the evidence in front of it is incomplete. The difference is which priors. The physician's priors are shaped by this hospital, this population, this patient's trajectory. The model's priors are shaped by its training and post-training distributions. Both are inferring. Only one is likely to be well adapted to this receiver and this situation.
There's a second problem the AI introduces. Every AI-mediated handoff adds another codec hop, and most clinical AI is chart-mediated, not patient-direct. The patient's experience compresses into what the clinician perceives. The clinician's perception compresses into a chart entry. The AI retrieves chart entries and re-represents them internally. The AI generates an output. The clinician reconstructs a working model from that output. The decision falls out of the reconstruction. Each compression has its own error mode. The clinician at the end may feel they are hearing the patient; what they are reading is an AI's inference layered over a chart's compression of what an earlier clinician saw. Sometimes a good summary cleans up noise; often it doesn't, and the human at the end has no easy way to tell which.
The implication for the field is uncomfortable. Building good medical AI takes more than raw capability. A model can be much larger, trained on much more data, and still miss the local sense of what matters, because that mismatch is partly structural. It comes from what the system was trained to notice, what it learned to compress away, and what it never saw. Scale helps many things, but it does not solve this by itself: the training and evaluation regimes that produce capable models also reward confident guessing over calibrated uncertainty, which means more capability can produce more confident wrongness rather than less (Kalai et al., 2025). Scale alone does not close the gap.
What also matters, alongside raw capability, is alignment of frames: making the AI's sense of what matters match the clinician's sense of what matters for the decision in front of them. Not making the AI smarter in the abstract. Making it speak the same language as the human at the receiving end: the same shorthand, the same priors about what's worth flagging and what's noise, the same practical sense of when to say "I'm sure" and when to say "I'm guessing."
The patient is still sitting at the end of every chain we build. Every codec hop puts another inference between them and the decision about their care. The right question for medical AI is rarely just "is it intelligent enough." The deeper question is whether the way it represents what matters fits the way the person who will act on it represents what matters, for the case in front of them, tonight.
Six Essays on Compression · Preface · I · II · III · IV · V · VI · Coda · Postscript