The Conversations Beneath the Code: Triadic Data for Long-Horizon Software Engineering Agents

Yelin Kim

arXiv:2605.02244·cs.SE·May 5, 2026

The Conversations Beneath the Code: Triadic Data for Long-Horizon Software Engineering Agents

Yelin Kim

PDF

TL;DR

This paper advocates for triadic data capturing human-human conversations, human-AI interactions, and cross-functional work to train long-horizon software engineering agents, addressing current limitations in short-term benchmarks.

Contribution

It introduces the concept of triadic data as essential for developing advanced SWE agents and details a four-tier framework for evaluating such data quality.

Findings

01

Triadic data can be captured in 12-18 months using existing methods.

02

It addresses four open questions in agent training.

03

Proposes a four-tier evidence framework for data quality assessment.

Abstract

Frontier software engineering agents have saturated short-horizon benchmarks while regressing on the work that constitutes senior engineering: long-horizon, multi-engineer, ambiguous-specification deliverables. This paper takes a position on what training data is needed to close the gap. The substrate for the next generation of SWE agents is neither larger GitHub scrapes nor more solo-agent trajectories nor -- sufficient by itself -- open human-AI dialogue logs. It is triadic data: synchronized capture of the human-human conversations where engineering context is formed, the human-AI sessions where that context is partially consumed, and the multi-week cross-functional work that surrounds both. We argue that the canonical instantiation of triadic data is two complementary products: long-horizon expert trajectories captured under stimulated-recall protocols, and simulated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.