Theory of Code Space: Do Code Agents Understand Software Architecture?

Grigory Sapunov

arXiv:2603.00601·cs.SE·March 20, 2026

Theory of Code Space: Do Code Agents Understand Software Architecture?

Grigory Sapunov

PDF

Open Access

TL;DR

The paper introduces ToCS, a benchmark to evaluate AI code agents' ability to understand and maintain software architecture, revealing model-dependent strengths and weaknesses in exploration and belief retention.

Contribution

It presents the ToCS benchmark for assessing architectural understanding in code agents and analyzes how different models explore, externalize, and maintain beliefs about code structure.

Findings

01

Active exploration benefits some models more than passive viewing.

02

Structured belief maps can scaffold some models' understanding.

03

Belief maintenance varies with model size and architecture.

Abstract

AI code agents excel at isolated tasks yet struggle with multi-file software engineering requiring architectural understanding. We introduce Theory of Code Space (ToCS), a benchmark that evaluates whether agents can construct, maintain, and update coherent architectural beliefs during codebase exploration. Agents explore procedurally generated codebases under partial observability -- opening files under a budget -- and periodically externalize their belief state as structured JSON, producing a time-series of architectural understanding. Three findings emerge from experiments with four baselines and six frontier LLMs. First, the Active-Passive Gap is model-dependent: one model builds better maps through active exploration than from seeing all files at once, while another shows the opposite -- revealing that active exploration is itself a non-trivial capability absent from some models.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Advanced Software Engineering Methodologies · Scientific Computing and Data Management