Revisiting DAgger in the Era of LLM-Agents

Changhao Li; Rushi Qiang; Jiawei Huang; Chenxiao Gao; Chao Zhang; Niao He; Bo Dai

arXiv:2605.12913·cs.LG·May 14, 2026

Revisiting DAgger in the Era of LLM-Agents

Changhao Li, Rushi Qiang, Jiawei Huang, Chenxiao Gao, Chao Zhang, Niao He, Bo Dai

PDF

TL;DR

This paper revisits the DAgger algorithm for training long-horizon language model agents, demonstrating it effectively mitigates covariate shift and improves performance on software engineering benchmarks.

Contribution

The paper adapts DAgger for multi-turn LM agents, combining on-policy interaction with supervised learning to enhance training effectiveness.

Findings

01

DAgger improves performance over baseline models on SWE-bench Verified.

02

4B-scale DAgger agent outperforms some larger models in software engineering tasks.

03

8B-scale DAgger agent surpasses existing SWE systems and approaches larger models.

Abstract

Long-horizon LM agents learn from multi-turn interaction, where a single early mistake can alter the subsequent state distribution and derail the whole trajectory. Existing recipes fall short in complementary ways: supervised fine-tuning provides dense teacher supervision but suffers from covariate shift because it is trained on off-policy teacher trajectories; while reinforcement learning with verifiable rewards avoids this off-policy mismatch by learning from on-policy rollouts but with only sparse outcome feedback. We address this dilemma by revisiting Dataset Aggregation (DAgger) for multi-turn LM agents: the algorithm collects trajectories through a turn-level interpolation of student and teacher policies, and the student is then trained on these trajectories using supervised labels provided by the teacher. By directly interacting with environments, we expose the model to realistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.