
Clinical AI's Binding Problem
Diagnosis is compression. Care is binding.
1. The relief of the name
"Seventy-six-year-old with COPD, CHF, and pneumonia, septic." A complex stranger collapses into one sentence, and you can hold the sentence in your head.
The relief is real. Before the name, the patient is a cloud of vital signs, symptoms, history fragments, a worried family member, a clock running. After the name, the patient is pneumonia, and pneumonia is a thing you know what to do with. You stop staring at an unbounded problem and start managing a familiar one.
This compression is the engine of clinical work. Teaching, handoffs, orders, billing, public health, guidelines: much of modern medicine is built on the move from patient to name and back again. The compression is not the bug. It is what made modern medicine possible.
But the compression is not the whole act. Medicine has a name for the disease and pathways for the disease. What it lacks is a formal language for fitting that disease knowledge back onto this specific patient. Call that binding. A diagnosis is a pointer to generalized disease knowledge, not a representation of the patient. The patient-specific work of clinical judgment begins when the knowledge is bound back to this person's substrate, trajectory, and goals, against the decision in front of you. That move from name to person is the work this essay is about.
2. The cache is a gift
When a clinician says pneumonia, they are not naming a thing in the world. They are invoking a compressed archive: anatomy since Vesalius, pathology since Virchow, imaging since Roentgen, epidemiology since Snow, microbiology since Koch, randomized trials since streptomycin, genomics since the millennium. Disease names are the most expensive compression algorithms in human science, collapsed into a single word a resident can write in three seconds.
This compression is free at the point of use, and that is its quiet miracle. A first-year resident, never having read a paper on community-acquired pneumonia, writes "76yo with CAP" and inherits the work of generations without effort or awareness. Without the cache, every encounter would start from raw signs and re-derive medicine from scratch. Most patients would die waiting.
The cache is the most generous gift our predecessors have given us, and we use it without acknowledging it because, like all great compressions, it has become invisible.
3. The patient was never in it
Here is a fact that took me a while to feel:
Two 68-year-old men walk into your ED with community-acquired pneumonia. Same vitals: BP 128/74, HR 96, RR 20, SpO2 94% on room air, T 38.4. Same exam: rales at the right base, otherwise unremarkable. Same diagnosis. Same line in the chart. Same row in any database keyed on disease. Opposite trajectories. Different right care.
One is a retired carpenter on stable hypertension monotherapy, came in because his wife (a retired nurse) noticed his cough getting worse. He'll be discharged on oral antibiotics and likely running errands by next week. The other is eight years post-kidney-transplant on tacrolimus and prednisone, baseline creatinine 1.6, three admissions in the last year, lives alone since his wife died. The same plan would put him at meaningful risk of fungal superinfection, drug toxicity with his transplant regimen, decompensation overnight, and failure to return. The difference lives in physiologic reserve, the kind that is left after eight years on tacrolimus, three admissions in a year, a year of grief. Reserve has to be inferred, not measured. The medication list, the prior-visit pattern, what he says to the triage nurse, the look on his face: all are signals. None of them are in the labs. Most of them are not in the disease label.
This is the part the cache cannot hold. Disease names compress what generalizes across patients with a condition. What does not generalize is not in the label and cannot be put in it: what this specific person brought into the encounter, what they want from the rest of their life, where they were heading before they got here. The population is in the cache. The individual patient is not. And the individual case is what you have to manage.
Medicine has long sensed this gap. Frailty indices, performance status scales, CURB-65 and NEWS2, goals-of-care documentation: all are proxies that try to carry forward what the disease name does not. Good clinicians lean on them. They are still compressions stacked on the original compression. Knowledge of disease compresses. Patients do not.
4. What clinicians actually do
The work clinicians do when they hear a diagnosis is two things in sequence, and only one of them has been fully systematized.
First, they decompress. Hearing "CAP, septic," the trained clinician unfolds a multi-component plan: blood cultures before antibiotics, broad-spectrum coverage, fluid resuscitation, lactate monitoring, source workup, disposition criteria. This is what residency is for. The cache decompresses cleanly into a generic management template, and the template is mostly right for the typical patient.
Then, at the bedside, they bind. The generic template gets adjusted to this patient's specific shape. Renal function modifies the antibiotic dose, anticoagulation changes the procedural plan, baseline cognition changes the disposition decision, the daughter at the bedside changes the conversation about goals, the fact that the patient came in walking unassisted last month changes the expectation of recovery. None of that is in the diagnosis. All of it has to be re-attached to the diagnosis at the bedside, by someone with the patient in front of them.
The intern knows the pneumonia pathway. The attending knows why this patient breaks the pathway. Decompression is what training teaches you to do. Binding is what experience teaches you to do well. The first is the foundation. The second is the work.

5. Two frontiers, neither binding
There are two layers to clinical AI right now. Neither has made binding its center.
The intellectual frontier, meaning the big foundation-model labs and the medical fine-tunes that follow them, is heavily focused on cache enlargement. Bigger models trained on more medical literature. Better differential generation from typed vignettes. Improved performance on MedQA, USMLE, and the rest of the medical knowledge benchmarks. OpenEvidence, Hippocratic AI, Med-PaLM and their peers are all serious work, all aimed at sharper retrieval, synthesis, and decompression. That work is real, and it makes the cache more available outside the academic centers that have always had the easiest access to it.
The deployment layer is more varied. Ambient documentation has become the field's first breakout. Abridge, Ambience, Microsoft's Dragon Copilot capture the bedside conversation, structure it, produce the note. Most academic health systems now use one. Imaging AI reads the scan. Workflow automation handles coding, prior authorization, inbox triage. Risk-prediction models score the patient for sepsis or deterioration.
Each layer is doing real work. Neither has crossed into binding.
The medical LLM produces the differential but does not carry the patient across visits or know what this patient wants. The ambient scribe captures the conversation but produces the note, not the management call. The imaging model reads the scan but does not know the goal. The risk model scores against a generic outcome, not the decision actually in front of you. Each system does a piece of what binding requires; none of them does the integration.
The most underweighted axis right now is what gets bound, not what gets captured or retrieved. The patient's work of breathing is in the scribe's transcript but not in the recommendation. The way they stop answering mid-sentence is heard but not interpreted against the goal. The trajectory between 2pm and 4pm is in the monitor but not in the next-question selection. The daughter's face when she says he's not himself is in the room but not in the system that's producing the differential. Some of these facts can be placed in a prompt if elicited and structured. But prompting is not binding. Binding requires persistent patient state, temporal tracking, goal-awareness, and continuous updating against the decision actually in front of the clinician. Literature can inform the priors. It cannot supply the patient-specific state. All of it has to be sensed, elicited, or tracked, and then bound to the disease cache, the goal, and the next decision.
The pieces of binding exist on the market. The system that binds does not. That is the gap that matters.
6. AI plus the doctor, for the patient
A clinical encounter does not start with a diagnosis. It starts with a chief complaint and a goal. The work between them is reducing the right uncertainties about the patient's state to a decision the patient can act on. The diagnosis, when it comes, is one stop on that path. It is not the destination.
The doctor is the author of the plan. They receive input from the bedside nurse who has spent hours with the patient, from the family who knows the baseline, from monitors catching the trends the eye misses, from the patient themselves, and increasingly from an AI that can hold state no human can hold under load. They integrate, decide, and own the call. AI does not replace what happens at the bedside. It is there to help the bedside happen better.
What the doctor cannot do alone, under load at 3 a.m. on hour eleven of a shift, is hold thirty prior visits of one patient at full resolution, compute which next question or test would reduce decision-relevant uncertainty given this patient's goal, remember what the daughter said to the resident six weeks ago, or audit every recommendation for the facts that would invalidate it. The system can. It carries state minute to minute within an encounter and visit to visit across them. It senses what is not in the chart through audio capture of the bedside conversation, through patient-facing channels that surface what the rushed visit would skip, through extraction of facts buried in prior notes. It names the load-bearing assumptions of every recommendation, so the next question, the next test, the next moment of observation is the one that actually reduces the uncertainty the patient is here to resolve. This is value of information in clinical terms. The AI is one more input, but a peculiar one: it holds state across time and refines it against the next decision in front of the patient. Doctors have done this work alone, in their heads, for as long as there have been doctors. That is what the AI changes.
Binding is two operations. The field is making real progress on the second. Research is starting to reach for the first in pieces, in benchmarks and simulations, but no deployed product binds it to a specific living patient at the bedside. Both are work the doctor has always done. Only one has had a partner.
The first operation is patient-state-conditioned diagnosis. Not the textbook noun, but the bedside act: generating a hypothesis list that fits this patient's prior rather than the population's; updating that list as each new datum arrives, across hours within an encounter and across visits between them; choosing the next question, test, or observation because it has the highest expected reduction in the uncertainty the pending decision actually turns on. The transplant patient and the retired carpenter walk in with the same chief complaint and the same disease label and need different next questions, because their priors are different and what would change the disposition is different. This is what good doctors have always done in their heads, one patient at a time, mostly alone. The label pneumonia is what falls out when this work is done well. It is not the work itself.
The second operation is applying the disease cache to that specific state, fitting the general knowledge to the individual case. This is largely what bigger models, better retrieval, and sharper synthesis are already doing. Hand a modern medical LLM a well-specified patient state and it can produce a useful first draft of a plan. The cache enlargers are working on what is tractable.
Concretely: a binding system would not merely say "pneumonia." It would say: this patient's current risk depends on immunosuppression, renal reserve, medication interactions, ability to return, and the trajectory of oxygen requirement over the next four hours. It would identify which assumptions are load-bearing and what information would change the disposition.

What that partner is, built rather than metaphorical, is a patient-state binding layer: a system that sits between raw encounter data, prior history, disease knowledge, and the physician's next decision, that holds the patient across time the way a long-term primary care doctor used to, and that surfaces what would change the call. Its output is not a diagnosis. Its output is a continuously updated patient-state model: the current substrate, trajectory, goals, decision-relevant uncertainties, load-bearing assumptions, and the next information that would change the plan. That is the category the field has not yet shipped.
7. The patient on the other side of the name
Compression is necessary. Decompression is necessary. The patient is what is left.
The patient is on the other side of the name. The patient has been there the whole time. Compression is the finger; the patient is the moon. The work of clinical care has always been about looking past the one to find the other, and the best clinicians have always been the ones who can hold both at once without confusing them.
If clinical AI wants to matter for patients, it has to enter the side of the equation it has so far avoided: tracking the state, holding the trajectory, binding what we know about the disease to what is true about this person. Bigger libraries are not the answer to a problem that was never about library size. The unsolved half is hypothesis management against this patient: which hypotheses to entertain, what would change which one, what to ask next. The patient is the work. A system that wants to help has to learn to see the patient, not to retrieve more about the disease.