Yet Even Less Is Even Better For Agentic, Reasoning, and Coding LLMs

CodeArts Model Team: Yang Ye; Jingyuan Tan; Tianyue Jiang; Ruizhe Ye; Qiankun He; Jiarui Yang; Jian Dong; Sicong Liang; Chongjian Yue; Peibai Xu; Lufan Lu; Shiguan Pang; Taotao Qian; Junbao Hu; Yuechan Hao; Ensheng Shi; Qi Zhang; Yi Hao; Na Fan; Xin Tan; Shuai Yao; Zhiwei Shen; Zongchen Li; Yanlin Wang; Chong Chen; Yuchi Ma

arXiv:2604.00824·cs.SE·April 7, 2026

Yet Even Less Is Even Better For Agentic, Reasoning, and Coding LLMs

CodeArts Model Team: Yang Ye, Jingyuan Tan, Tianyue Jiang, Ruizhe Ye, Qiankun He, Jiarui Yang, Jian Dong, Sicong Liang, Chongjian Yue, Peibai Xu, Lufan Lu, Shiguan Pang, Taotao Qian, Junbao Hu, Yuechan Hao, Ensheng Shi, Qi Zhang, Yi Hao, Na Fan, Xin Tan, Shuai Yao, Zhiwei Shen

PDF

TL;DR

This paper introduces STITCH, a training framework that enhances agentic reasoning and coding capabilities in large language models using fewer high-quality trajectories, inspired by the 'Less-Is-More' hypothesis.

Contribution

The authors extend the 'Less-Is-More' concept to agentic tasks and develop STITCH, a coarse-to-fine filtering mechanism that improves training efficiency and performance across multiple models and languages.

Findings

01

Models trained with STITCH show up to 63.16% improvement on SWE-bench.

02

MiniMax-M2.5-STITCH achieves 43.75% on Multi-SWE-bench Java.

03

GLM-4.7-STITCH increases compilation success rate to 61.31% on HarmonyOS.

Abstract

Training effective software engineering agents requires large volumes of task-specific trajectories, incurring substantial data construction costs. Inspired by the "Less-Is-More" hypothesis in mathematical reasoning, we investigate its extension to agentic scenarios and propose an end-to-end training framework that achieves superior agentic capabilities with fewer but higher-quality training trajectories. This is achieved via STITCH (Sliding-memory Trajectory Inference and Task Chunking Heuristic), a coarse-to-fine mechanism that filters low-value noise and retains decision-critical tokens to maximize training signal quality. We conduct experiments across multiple agent frameworks (e.g., mini-SWE-agent, MSWE-agent), model scales (30B to 355B), and multilingual settings (Python, Java, and ArkTS). On SWE-bench Verified, models trained with STITCH achieve up to 63.16% relative improvement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.