The Conversations Beneath the Code: Triadic Data for Long-Horizon Software Engineering Agents
Yelin Kim

TL;DR
This paper advocates for triadic data capturing human-human conversations, human-AI interactions, and cross-functional work to train long-horizon software engineering agents, addressing current limitations in short-term benchmarks.
Contribution
It introduces the concept of triadic data as essential for developing advanced SWE agents and details a four-tier framework for evaluating such data quality.
Findings
Triadic data can be captured in 12-18 months using existing methods.
It addresses four open questions in agent training.
Proposes a four-tier evidence framework for data quality assessment.
Abstract
Frontier software engineering agents have saturated short-horizon benchmarks while regressing on the work that constitutes senior engineering: long-horizon, multi-engineer, ambiguous-specification deliverables. This paper takes a position on what training data is needed to close the gap. The substrate for the next generation of SWE agents is neither larger GitHub scrapes nor more solo-agent trajectories nor -- sufficient by itself -- open human-AI dialogue logs. It is triadic data: synchronized capture of the human-human conversations where engineering context is formed, the human-AI sessions where that context is partially consumed, and the multi-week cross-functional work that surrounds both. We argue that the canonical instantiation of triadic data is two complementary products: long-horizon expert trajectories captured under stimulated-recall protocols, and simulated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
