Rethinking the Value of Agent-Generated Tests for LLM-Based Software Engineering Agents

Zhi Chen; Zhensu Sun; Yuling Shi; Chao Peng; Xiaodong Gu; David Lo; Lingxiao Jiang

arXiv:2602.07900·cs.SE·April 10, 2026

Rethinking the Value of Agent-Generated Tests for LLM-Based Software Engineering Agents

Zhi Chen, Zhensu Sun, Yuling Shi, Chao Peng, Xiaodong Gu, David Lo, Lingxiao Jiang

PDF

TL;DR

This paper investigates the role and value of agent-generated tests in LLM-based software engineering, finding that such tests mainly serve observational purposes and do not significantly impact task success.

Contribution

The study provides empirical evidence that current agent-written tests are more about process than improving final outcomes, challenging assumptions about their utility.

Findings

01

Test writing is common across models but does not differ significantly between resolved and unresolved tasks.

02

Agent-written tests mainly serve as observational feedback, with print statements being more common than assertions.

03

Prompt modifications to increase or decrease test writing do not significantly affect final task outcomes.

Abstract

Large Language Model (LLM) code agents increasingly resolve repository-level issues by iteratively editing code, invoking tools, and validating candidate patches. In these workflows, agents often write tests on the fly, but the value of this behavior remains unclear. For example, GPT-5.2 writes almost no new tests yet achieves performance comparable to top-ranking agents.This raises a central question: do such tests meaningfully improve issue resolution, or do they mainly mimic a familiar software-development practice while consuming interaction budget? To better understand the role of agent-written tests, we analyze trajectories produced by six strong LLMs on SWE-bench Verified. Our results show that test writing is common, but resolved and unresolved tasks within the same model exhibit similar test-writing frequencies. When tests are written, they mainly serve as observational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.