May 23, 2026 · 18 min read

What Determines Whether Coordination Survives

Companion to Compressed Coordination · The full multi-axis framework

By Sunny Harris, MD

Coordination between bounded agents has more failure modes than the bounded-resources axis can carry alone. The Compressed Coordination foundation essay developed that axis carefully — M < N, the inequality, the failure modes, the defenses — and acknowledged at the end that other axes exist. This is the essay that develops them.

Six axes shape whether two agents can coordinate. Bounded resources is one. The others are learning history, substrate heterogeneity, channel properties, incentive structure, and system-level structure. Each produces distinct failure pressure that doesn't reduce to the others. Coordination failure is a surface in this six-axis space; what we observe in any particular system is a projection of that surface onto the channel of measurement we happen to have.

The framework is testable to the extent we can distinguish, from the observable signatures of a failure, which axis is producing it. Most of this essay is about that methodology and the predictions that survive it.

The axes

Resource bounds

Memory, compute, attention, deadline. The axis the foundation essay developed: M < N produces compression, compression underdetermines, reconstruction fills gaps. Each sub-axis produces a characteristic failure pressure. Memory limits produce premature commitment — the agent can't hold every alternative, so the leader gets protected. Compute limits produce search-shortcut errors — exact inference is unaffordable, approximations introduce bias. Attention limits produce missed signals — the cue was in the field but unprocessed. Deadline pressure produces premature commits — the agent acts before it would have otherwise verified. Resource bounds are the most carefully studied axis (information theory, rate-distortion, partially observable Markov decision processes) because they're the easiest to formalize. They're also the easiest to relax in tests: more memory, more bandwidth, more compute, more time. That makes them the natural starting point for any framework, and the easiest one to mistake for the whole.

Learning history

Training data exposure, prior structure, representational and model-class capacity, ontology. Agents with identical runtime resources can diverge because they learned different partitions from different data. The clinician trained at one institution carries a partition over chest-pain presentations that differs from one trained elsewhere, even though both can process the same patient at runtime. The AI model trained on one distribution will encode anxiety differently from a model trained on another, even though both will answer the prompt. This axis produces what the curriculum called shared vocabulary, divergent state — the tokens match on the wire while the equivalence classes behind them differ. It survives infinite runtime resources. The only defense is shared learning history (which forces alignment but is expensive) or explicit ontology bridging (translation infrastructure, alignment layers, cross-system vocabularies). Cross-institutional medicine, cross-organizational data flows, and cross-model coordination all live primarily on this axis.

Substrate heterogeneity

Different sensor modalities, effector types, computational architectures — what the agent can represent at all. A vision-language model and a clinical-record-only model can attempt to coordinate, but they cannot share certain representational atoms. The radiologist holds an image; the database holds structured fields; the LLM holds tokens. Each can encode the others' content imperfectly, but the encoding loses information that the source substrate carried natively. The failure mode is modality gap — communications across substrates where one side has structural capacity to represent something the other does not. The defense is bridge representations (cross-modal embeddings, structured translation layers, shared latent spaces) that explicitly model what each substrate can carry. As multi-modal AI scales, this axis becomes more binding rather than less, because each new modality introduces new substrate that the others may not match.

Channel properties

Bandwidth, reliability, signal-to-noise ratio, latency, medium, authentication. Bandwidth was the foundation essay's primary channel-level concern, but the others matter independently. A high-bandwidth channel with high noise produces corruption-driven divergence — the signal arrives mangled, and reconstruction lands on a wrong pattern. A high-bandwidth channel with packet loss produces silent gaps the receiver doesn't know are gaps. A high-bandwidth channel with no authentication produces spoofing: the receiver attributes the message to the wrong sender, with downstream consequences for trust and provenance. Latency interacts with deadline pressure: a high-bandwidth channel that takes minutes to deliver is, for an action that must commit in seconds, no channel at all. Each channel property has its own failure signature; the defenses are property-specific (error correction for noise, retransmission for loss, signing for authentication, queue management for latency).

Incentive structure

Utility function alignment, trust, adversarial vs cooperative, fault model. The foundation essay introduced this axis as the source of strategic underdetermination — the failure mode that emerges when sender and receiver have misaligned utilities. But the axis is broader than strategic ambiguity alone. Trust assumptions determine whether the receiver even attempts verification; fault models determine which kinds of corruption are anticipated and defended against. A system designed for honest-but-buggy senders fails catastrophically against an adversarial sender; a system designed for Byzantine adversaries spends massive overhead defending against threats that don't exist in cooperative settings. Incentive structure is the axis where most institutional and political coordination failures live — not because the participants are bounded but because their objectives differ. Strategic underdetermination is the canonical failure mode; trust collapse and adversarial exploitation are the others.

System-level structure

Synchrony, public-signal availability, common-knowledge structure, network topology, N-party dynamics. This axis only becomes visible in multi-agent settings; dyadic coordination cannot surface it. The classical results — Lamport's logical clocks, the coordinated-attack impossibility, Fischer-Lynch-Paterson on asynchronous consensus — show that coordination has failure modes irreducible to individual capacity bounds. Three honest agents with infinite resources cannot establish common knowledge over an unreliable channel, regardless of how many acknowledgment rounds they run. A network of fast agents with finite synchrony can deadlock or drift even when each pair is coordinating fine. As agentic AI scales beyond dyadic handoffs, this axis becomes more binding. The defenses come from the distributed-systems tradition: consensus protocols, broadcast structures, vector clocks, leader election. Most of them assume specific reliability and synchrony conditions and degrade in their absence.

Six axes. Each produces coordination failure independently. None reduces cleanly to the others. The framework's job is to predict which one is binding in any given system — and to surface the observable signatures that let you diagnose it.

Why this is a surface, not a list

The natural temptation is to map each axis to a specific failure mode: bandwidth produces silent reconstruction, memory produces asymmetric updating, learning history produces shared-vocabulary divergent state. A neat one-to-one mapping. The framework would be diagonal — each axis owns one failure.

It isn't. The mapping is many-to-many. Silent reconstruction can come from any combination of:

Bandwidth bound (the channel couldn't carry the discriminator)
Attention limit (the receiver didn't foveate the discriminator)
Compute shortcut (the receiver matched approximately and the discriminator didn't pass threshold)
Memory loss of caveat (the receiver had the discriminator but discarded it to free room)
Representational coarseness (the receiver's stored patterns don't differentiate along the discriminator's dimension)
Learning history mismatch (the receiver was trained on data where the discriminator wasn't predictive)

Same failure mode, six different causal pathways. The diagonal map is wrong.

What replaces it is a sensitivity matrix. Rows are observables — measurable signatures of coordination failure. Columns are axes — primitive resources or properties that can be intervened on. Entries are the size and direction of the effect when you intervene on that axis. Coordination failure is a surface in this matrix's space; particular observed failures are projections.

The observables worth measuring include:

Path dependence / order sensitivity. Does the receiver's reconstruction depend on the order in which evidence arrived? (Likely loaded by: memory, compute, deadline)
Confident inference of omitted specifics. When the message lacked a discriminator, did the receiver fill confidently rather than ask? (Channel, attention, representation)
Lexical agreement with behavioral divergence. Did both agents use the same words while acting on different interpretations? (Learning history, ontology, substrate)
Sender ambiguity correlating with sender incentives. Did the sender choose ambiguous formulations precisely where their utility favored the receiver landing in a specific class? (Incentive structure)
Common-knowledge gap signatures. Did multi-agent action diverge in ways that pairwise coordination would have prevented if global state had been visible? (System-level structure)

A useful framework is one whose sensitivity matrix columns are distinguishable enough to identify which axis is binding given the available observables. If the columns collapse — relaxing different axes produces the same observable shifts — the framework is a single-axis framework wearing six masks.

This is the actual test. Not "do the failure modes match the bounds named in the curriculum" (a low bar, easily met by any framework that names anything). The test is: does the bound-to-observable sensitivity matrix have rank close to six? If yes, the multi-axis framework is doing work a one-axis framework can't. If no, the multi-axis framework is decorating something simpler.

Coordination failure is a surface in axis-space. Any particular failure we observe is a projection of that surface onto a specific measurement channel. The framework's promise is that the surface is rich enough — and the projections informative enough — that diagnosis is possible. Whether the promise holds is an empirical question; the rest of this essay names the predictions and the tests that would settle it.

What the framework predicts that survives this rigor

Four predictions follow from the multi-axis structure. Each has a stated falsifier — the test that would show the framework is wrong about that prediction. Together they map the framework's load-bearing claims.

Recursive codec drift between adapting agents

When two adaptive agents are connected by a bounded channel and both can update their compressions based on the other's responses, they will co-evolve. Each agent learns which signals get the response it wanted; each agent reweights its sending behavior toward what worked. Over time, the dyad converges on a private codec optimized for the channel and the partner — increasingly efficient for the pair, increasingly detached from external interpretability.

This is the prediction the foundation essay's original framework could not make. The clinical curriculum assumed culturally-anchored vocabulary (medical training, standardized order sets) that fixed the codec. The multi-axis framework removes that assumption: with learning history dynamic and the channel bounded, the codec drifts.

Mechanism. The bounded channel forces compression. The dyadic learning loop reinforces compressions that the partner responds to. Without an external anchor pulling the codec back toward shared infrastructure (a third party, a regulator, a human reader, a training-time penalty), the codec converges on whatever maximizes dyadic mutual information regardless of external readability.

Observable. Dyadic mutual information rises monotonically over training; mutual information between dyadic output and external reference distributions falls. A specific signature: human readers can interpret early-training exchanges and find them coherent; later-training exchanges become progressively word-salad-like to humans while remaining maximally informative within the dyad.

Test. Train two reinforcement-learning agents to coordinate on a task with a private channel. Vary channel narrowness and learning rate. Predict: rate of drift scales with both. Measure mutual information with a held-out human-interpretable distribution at intervals during training.

Falsifier. Adapting-agent dyads that don't drift — coordination success remains stable, but the codec remains human-interpretable indefinitely. Or: drift occurs at the same rate regardless of channel narrowness or adaptation rate. Either would mean the framework's account of dyadic codec dynamics is wrong.

If this prediction holds, the implication for AI-to-AI handoff is sharp: any deployed agentic system where both ends adapt is at risk of drifting away from the human-readable codec the system was supposed to support. The defense is either to anchor the codec (training-time penalty for unhuman-readable communication, third-party verification, human-readable intermediate forms) or to expect the drift and instrument for it.

Axis-specific intervention sensitivities are real and measurable

The framework claims six axes. If they're really distinct, intervening on one should produce a different observable signature than intervening on another. The sensitivity matrix should have distinguishable columns. This is the most direct test of the multi-axis structure.

Mechanism. Each axis represents a distinct causal pathway from system property to coordination failure. Memory limits produce specific shifts in order-sensitivity and prior-weighting; bandwidth limits produce shifts in omission-fill rates; learning history mismatch produces shifts in lexical-behavior divergence. The mechanisms are different even when the observed failure mode is superficially similar.

Observable. The sensitivity matrix — measurable in a controlled multi-agent simulation by manipulating one axis at a time and recording shifts in path-dependence, omission-conditioned hallucination, ontology gap, strategic ambiguity, and common-knowledge gap observables.

Test. Build a multi-agent simulation where each axis can be independently controlled: per-agent working memory, channel rate, per-agent compute budget, training-distribution overlap, utility alignment, network connectivity. Hold five constant; vary the sixth across a range; measure each observable. Repeat for each axis. Construct the sensitivity matrix.

The prediction: the matrix has rank close to six. The columns are distinguishable enough that, given a vector of observables from a real system, the framework can identify which axis is most likely binding.

Falsifier. The sensitivity matrix is degenerate. Relaxing memory and relaxing bandwidth produce the same observable shifts (within experimental noise). Relaxing learning-history mismatch and relaxing substrate heterogeneity are indistinguishable. The framework's six axes collapse into two or three.

If this prediction fails, the multi-axis framework should be pruned to whatever rank the matrix actually has. The honest framework might be three-axis (capacity, learning history, incentives) rather than six. The current framework's elegance — six distinct axes, each generating distinct failure pressure — would be illusion.

This is the central methodological prediction. The other three predictions depend on the framework being rank-six. If it isn't, those predictions still might hold, but for reasons other than the framework names. The rank of the sensitivity matrix is what makes the framework testable rather than merely descriptive.

Strategic underdetermination scales with verification cost

The fourth breakdown class the foundation essay named — strategic underdetermination — requires three ingredients: misaligned utility, residual ambiguity, and bounded verifiability. The framework predicts that as the third ingredient (verification cost) rises, strategic underdetermination opportunities increase super-linearly. The more expensive it is to check, the more incentive the sender has to be strategically ambiguous.

This is the AI-builder-audience prediction. It matters because verification cost is rising rapidly in production AI deployments: multi-modal context expands what would need to be verified; AI-to-AI handoff removes the human verifier from the loop; automated review at scale makes individual-message audit infeasible.

Mechanism. When verification is cheap (a single read-back from a human attending), strategic underdetermination costs the sender little to defend against — they have to be precise because being vague gets caught. When verification is expensive (multi-modal AI output across thousands of cases per day with no human review), strategic underdetermination becomes a viable strategy: vague messages succeed at landing receivers in the sender-preferred class because the cost of checking is too high for the receiver to bear systematically.

Observable. Platforms running agentic AI under conflicting platform-side and user-side utilities should exhibit measurable patterns of strategic ambiguity correlated with platform incentives. Specifically: response variability on user requests that conflict with platform engagement objectives should be higher than on requests that don't conflict. The platform's AI should produce vaguer answers, more equivocation, and more let me know if you need more help hedges precisely on the requests where unambiguous answering would cost the platform engagement.

Test. Audit AI assistant deployments by paired request structure. Construct request pairs where one conflicts with a platform-side incentive (e.g., what's the cheapest alternative to your premium service) and one doesn't (e.g., what's the cheapest alternative to a generic competitor). Measure response specificity, decisiveness, and disambiguating-question rate. Predict: the conflicting-request response should be measurably more ambiguous.

Falsifier. Response variability is independent of incentive-conflict structure. The pattern of strategic ambiguity, if it exists, follows some other variable (request difficulty, training-distribution coverage, model uncertainty) and not incentive structure. The framework's claim about strategic underdetermination scaling with incentive-misalignment plus verification-cost would then be wrong about the mechanism.

If the prediction holds, the policy implication is sharp: platforms running agentic AI need verification infrastructure (third-party audits, structured response logging, incentive disclosure) that the current product landscape mostly lacks.

Multi-agent system-level failures don't reduce to individual capacity

The sixth axis — system-level structure — exists because some coordination failures persist regardless of how capable individual agents become. The framework predicts that as agentic AI scales beyond dyadic interaction, network-level failure modes will emerge that look like coordination breakdown but aren't bound-driven on any individual agent.

This is the multi-agent prediction. It's the one most likely to surprise builders, because the prevailing intuition is that scaling per-agent capability scales coordination quality.

Mechanism. Classical impossibility results — Lamport on logical time, the coordinated-attack problem, Fischer-Lynch-Paterson on consensus under asynchrony — show that certain coordination outcomes cannot be guaranteed over unreliable or asynchronous networks regardless of individual agent capacity. Common knowledge cannot be established over a faulty channel; consensus cannot be reached deterministically under asynchrony with even one faulty node. The bottleneck is the network's structural properties, not the agents'.

Observable. Multi-agent agentic AI systems should exhibit failure modes whose rate scales with channel reliability and network topology, not with per-agent capacity. Specifically: as you scale per-agent compute, memory, and context-window in a multi-agent system while holding channel reliability fixed, some coordination failures should decline (the capacity-bounded ones), but others should not (the system-level ones).

Test. In a multi-agent AI deployment, hold channel reliability and network topology fixed and scale per-agent capacity over a range. Predict: the failure-rate curve has two regimes — one where capacity scaling reduces failures (the bound-driven regime), and one where it doesn't (the system-level regime). The transition point depends on how reliable the underlying communication infrastructure is.

Falsifier. Every coordination failure declines monotonically with per-agent capacity. There is no system-level regime that scaling cannot fix. The framework's claim that system-level structure is an independent axis would be wrong; everything reduces to individual capacity bounds after all.

If the prediction holds, the implication for multi-agent AI infrastructure is that the field needs to build the distributed-systems tradition's toolkit (consensus protocols, broadcast structures, vector clocks, leader election) into agentic frameworks before they scale. Without it, scaling per-agent capability will hit a wall whose cause is invisible from inside the per-agent perspective.

What can't be predicted yet

The multi-axis framework is more honest than the single-axis predecessor, but it's not complete. Several questions remain open.

How do axes interact nonlinearly? Relaxing memory while bandwidth is binding might help (more buffer for unresolved alternatives) or hurt (more confidently wrong reconstructions held for longer). The framework names six axes but doesn't yet predict their interaction structure. A two-axis sensitivity matrix would have entries; a three-axis tensor would have interaction terms; the framework doesn't yet specify the form of the interactions.

Is substrate heterogeneity irreducible? The third axis — sensor modalities, effector types, computational architectures — might be a distinct axis or might be a slow form of "different learning histories applied to different channels." The framework treats it as independent in this essay. Empirical testing could collapse it into other axes if the sensitivity-matrix columns turn out to be linearly dependent.

Is energy a seventh axis or a meta-budget? In biological systems, energy is the fundamental constraint that prices all other resources. In compute systems, energy increasingly does the same. The framework currently parks energy as a meta-budget that constrains the other axes; this might be the right move or it might miss something important. The same question applies to thermal constraints, physical size, and other substrate-level economics.

Does the framework scale to N > 10 agents? All the multi-agent reasoning in this essay implicitly assumes small networks where pairwise coordination dynamics are primary. At larger N, emergent network properties (modularity, clustering, scale-free distributions) become first-class concerns. The framework might extend cleanly or might break down at network scale.

Where do priors come from in adaptive systems? The learning-history axis assumes priors are inherited from training; in continually-learning agents, priors are also shaped by current interaction. The boundary between "learning history" and "current state" blurs. The framework doesn't yet handle this cleanly.

These are loose threads. They mark where the framework is suggestive but not yet operational. The next several years of multi-agent AI deployment will produce the empirical data that would tighten or untighten them. The framework's claim, narrowed: it identifies the questions worth asking, even where it doesn't yet answer them.

The applied closing

For builders, three implications.

Instrument for sensitivity, not for failure. The temptation in production AI systems is to measure outcomes — did the request succeed, did the action complete, did the user disengage. These are necessary but insufficient. The framework's value is in measuring which observable signatures shifted, which lets you diagnose which axis is binding. A system that instruments for path-dependence, omission-fill rates, ontology gaps, strategic ambiguity, and common-knowledge gaps gives you a sensitivity profile per deployment. The aggregate failure rate tells you something broke; the sensitivity profile tells you what.

Don't optimize one axis past the binding constraint of another. A lot of AI infrastructure investment goes into scaling axes that aren't binding for the current deployment's coordination failures. Doubling the model's context window doesn't help if the binding constraint is utility misalignment between platform and user. Adding more agents to a multi-agent system doesn't help if the channel between them is the bottleneck. The framework's prescription is diagnostic-first: identify the binding axis, then invest in relaxing it, then re-measure. The diagnostic comes from the sensitivity profile, not from intuition about what should help.

Multi-agent coordination has system-level failure modes you can't fix by making individual agents better. This is the hardest implication for AI builders trained on scaling laws. Per-agent capacity, model size, and training compute do not address common-knowledge failures, FLP impossibility, or topology-driven coordination breakdown. As agentic AI scales beyond dyadic handoffs into networks of cooperating agents, the distributed-systems tradition's toolkit becomes mandatory — consensus protocols, broadcast structures, vector clocks, leader-election mechanisms. Building these into agentic frameworks before they're needed is much cheaper than after.

For theorists, one implication: the framework now generates predictions, not just descriptions. The four predictions above are testable in controlled multi-agent simulations, and the sensitivity-matrix methodology is the test of whether the framework's six axes are real or decorative. The framework can be wrong in specific, useful ways. That's what makes it a framework rather than a vocabulary.

The clinical curriculum was the first place this architecture got stress-tested in production at high stakes. The foundation essay developed the bounded-resources axis. This essay extends to the full multi-axis space. The instantiations beyond medicine — multi-modal AI, AI-to-AI coordination, distributed systems, biology, science, cross-cultural communication, organizational and institutional design — each present different configurations of the same six-axis space. The framework's promise is that diagnosis transfers across them: aviation's read-back discipline (a channel-axis defense) lifts to clinical AI; clinical AI's structured belief states (a representational-axis defense) lift to multi-modal model design; distributed systems' provenance discipline (a system-level-axis defense) lifts to AI-to-AI protocol.

Same axes, different binding constraints, portable mitigations. That is the multi-axis framework, narrowed to its honest claim: coordination architectures are universal because the axes are universal, but the framework's job in any particular regime is to predict which axis will bind first and to surface the observable signatures that let you tell.

The clinical curriculum stress-tested one configuration. The next configurations are coming. The framework is a useful lens; whether it bites in production — at multi-modal AI scale, at agentic AI handoff scale, at distributed-system scale — is the empirical question still ahead.

A standalone companion to the Compressed Coordination foundation essay; references the Compressed Medicine curriculum as one substrate-specific instantiation.