The Specification Gap: Coordination Failure Under Partial Knowledge in Code Agents
Camilo Chac\'on Sartori

TL;DR
This paper investigates the coordination challenges faced by multiple code-generating agents with incomplete specifications, revealing a persistent gap in accuracy and proposing an AST-based conflict detection method that emphasizes the importance of detailed specifications for effective multi-agent code synthesis.
Contribution
It quantifies the impact of specification detail on multi-agent code integration and demonstrates that richer specifications significantly improve coordination and recovery.
Findings
Coordination accuracy drops from 58% to 25% as specification detail decreases.
An AST-based conflict detector achieves 97% precision without extra LLM calls.
Restoring full specifications recovers the single-agent performance ceiling.
Abstract
When multiple LLM-based code agents independently implement parts of the same class, they must agree on shared internal representations, even when the specification leaves those choices implicit. We study this coordination problem across 51 class-generation tasks, progressively stripping specification detail from full docstrings (L0) to bare signatures (L3), and introducing opposing structural biases (lists vs. dictionaries) to stress-test integration. Three findings emerge. First, a persistent specification gap: two-agent integration accuracy drops from 58% to 25% as detail is removed, while a single-agent baseline degrades more gracefully (89% to 56%), leaving a 25--39 pp coordination gap that is consistent across two Claude models (Sonnet, Haiku) and three independent runs. Second, an AST-based conflict detector achieves 97% precision at the weakest specification level without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Topic Modeling
