May 20, 2026 · 7 min read

The Highest Accurate Abstraction

Compressed Medicine · 3. The Highest Accurate Abstraction

By Sunny Harris, MD

Tuesday morning, ED triage room three. A 22-year-old with type 1 diabetes is brought in by his roommate, slurring his words and breathing fast. The intern picks up the case. Two minutes later the intern has a working impression. Here are two ways to write it.

Version one: "22yo male, type 1 DM. Confusion, dyspnea, vomiting times eight hours. Vitals: T 99.1, HR 124, RR 32, BP 102/68. Exam: dry mucous membranes, Kussmaul respirations, no focal neuro deficits. Labs: glucose 480, anion gap 24, bicarb 8, pH 7.18, ketones positive, K 5.2, BUN 28, Cr 1.4."

Version two: "22yo type 1 DM with DKA. Anion gap 24, pH 7.18. Started on fluids and insulin drip per protocol."

Same patient, same minute, two messages. The second activates the entire DKA model in the receiver's head: expected trajectory, standard treatment cascade, the potassium watch that becomes important once insulin starts, the gap closure timeline, the cerebral edema risk in children, the eventual transition criteria. The receiver does not have to assemble the picture from raw data. The diagnosis is the model loader. Once "DKA" lands, almost everything else follows.

The first message contains more facts. The second does more clinical work.

The principle

Use the highest accurate abstraction the receiver can decompress. The right starting compression is the most compressed representation the speaker can defend with the evidence in hand, and that the receiver has the trained codec to expand back into a working model. Disease names compress more than syndrome names. Syndrome names compress more than complaint patterns. Complaint patterns compress more than raw observations. The hierarchy is a measure of how much model the receiver gets for free, not a stylistic preference.

A diagnosis is a model loader. Naming it activates the receiver's stored knowledge about presentation, trajectory, complications, treatment, and disposition. Raw data forces the receiver to reconstruct that model from scratch, every time, before any of the downstream reasoning can happen. Compressing to the highest accurate level lets the receiver spend their cognitive budget on what discriminates this case from the model, not on rebuilding the model.

The hierarchy is receiver-relative. "DKA" is the highest-density token a trained clinician can decompress, but a layperson cannot expand it; the right starting compression for the patient's family lives at a different level. An AI system trained on conversational text decompresses "DKA" differently from one trained on structured medical schemas. The same patient state needs different starting abstractions for different receivers. The sender's job is to pick the highest level the actual receiver can carry, not the highest level the sender could imagine using.

The hierarchy

Four levels, descending order of compression.

Disease / diagnosis when known. The highest-density token. Activates the most stored model per character. "DKA, gap closing, K borderline low." "NSTEMI, hemodynamically stable, awaiting cath." "Uncomplicated appendicitis on CT, surgery aware." "COPD exacerbation with persistent exertional desaturation." In each case the disease name carries the model; the additional words modulate it.

Syndrome / clinical state when the disease is not yet justified. Still compresses heavily, but does not commit beyond what the evidence supports. "Undifferentiated shock." "Acute hypoxemic respiratory failure." "Febrile neutropenia." "High-risk syncope." "Sepsis physiology without source." These activate management pathways without forcing the receiver to accept an etiology that the data has not earned. The syndrome buys time for the disease-level diagnosis to be made cleanly.

Complaint-pattern when even the syndrome is not defined. The chief complaint plus discriminating features. "Exertional pressure-like chest pain with left arm radiation." "Sudden maximal-onset headache with neck stiffness." "Migratory RLQ pain with fever and anorexia." "Positional vertigo with normal neuro exam." These activate the receiver's pattern-recognition for the relevant differential without claiming a specific entity. The discriminating features prevent the receiver from running the wrong pattern.

Raw observations only when needed. Numbers, findings, history details. These belong in the message when they are abnormal, changing, threshold-relevant, decision-relevant, surprising, or needed to justify the compression at the level above. They do not belong by default; they belong when one of those conditions applies.

When to descend, and not before

The descent rule: pick the highest level you can defend, and descend only when the level above fails.

If the evidence supports the diagnosis, name the disease. The intern in triage three has lab values that nail the diagnosis; "DKA" is justified and is the right starting compression.

If the evidence supports a syndrome but the etiology is open, name the syndrome. A hypotensive patient with cold extremities and an unknown source has "undifferentiated shock" but not yet "septic shock" or "cardiogenic shock"; using the syndrome keeps the receiver's mind open across the differential that drives the next workup.

If even the syndrome is not pinned, use the complaint pattern. The chest-pain patient whose ECG is ambiguous, troponin pending, exam unremarkable, and story atypical does not have a syndrome yet. "Exertional pressure-like chest pain with risk factors for ACS" is the honest compression.

Raw observations dominate only when the model has not yet been built. The undifferentiated patient in their first ten minutes lives at the observation level. Compression to a higher level is the work of those ten minutes.

The failure modes

Two failures of compression at this layer matter.

Premature compression. Using a level higher than the evidence supports. "Sepsis" before the source is even possible to consider. "ACS" before any cardiac biomarker has come back. "Anxiety" while postpartum status has not yet been addressed. The premature label loads the receiver's model in a direction the data does not yet justify, and once loaded, that model anchors the rest of the workup. Premature compression is how anchoring errors propagate through teams; the leader becomes a label, the label becomes the working model, and disconfirming evidence has to fight an extra layer.

Lazy compression. The opposite failure: refusing to compress when the evidence justifies it. The chart note that lists vital signs and lab values when the diagnosis is established and the team is already managing to it. The handoff that recites the history when the disease is named and the next four hours' work is what matters. The clinical AI dashboard that returns lists of observations when the patient has a clear diagnosis the system could be operating on. Lazy compression offloads the model-building work onto the receiver, every time, even when the receiver already knows the model.

Between these two failures, the working rule is the same. Compress to the highest level the evidence supports. Not higher. Not lower.

The clinical-AI implication

A clinical AI system inherits this principle. The output should start at the highest abstraction the data and reasoning support, and descend only on demand or when the level above is not yet defendable.

A summary that lists symptoms when the chart already encodes a diagnosis is doing less work than it could. A differential generator that returns "ACS, PE, GERD, anxiety" when the evidence supports "low-risk chest pain with nonischemic ECG and serial troponins negative" is failing to compress at the syndrome level the case has earned. A scribe that captures the conversation but never proposes the disease-level token is offloading the highest-value compression step back to the clinician.

The opposite failure is worse in its own way. An AI that asserts "sepsis" on the strength of a single abnormal lactate, or "ACS" on the strength of nonspecific chest pain, performs premature compression with confidence; the clinician then has to fight not only the bad inference but the model that has already loaded around it.

The principle for the system is the same as for the clinician: start at the highest accurate abstraction. Defend it with the evidence. Descend cleanly when the evidence does not yet support the level above.

At the bedside

The intern in triage three has done the hard part: built the model from raw observations in two minutes, compressed to the disease-level token, and emitted a four-line note that activates the receiver's full DKA framework. The protocol runs. The fluids hang. The insulin drip starts. The next clinician picks up a case they already understand. The compression did the coordination work.

The next two parts operationalize what happens after the abstraction is chosen. Part 4 handles the order in which the elements unfold within the chosen level. Part 5 handles what to leave out at every level. Both depend on this one: the right level first, then everything else.


Compressed Medicine · Preface · 1. The Compression Substrate · 2. The Function of the Message · 3. The Highest Accurate Abstraction · 4. The Decompression Order · 5. The Minimum Sufficient Message · 6. The Grounding Constraint · 7. The Belief-State Object · 8. The Same Wall · 9. The Defense Architecture · 10. The Temporal Loop · 11. The Irreversible-Action Check