CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion
Yusong Lin, Haiyang Wang, Shuzhe Wu, Lue Fan, Feiyang Pan, Sanyuan Zhao, Dandan Tu

TL;DR
CLI-Gym introduces a scalable method for generating CLI environment-intensive tasks by inverting environment histories, enabling the training of more capable agents like LiberCoder, which significantly improves performance on terminal benchmarks.
Contribution
The paper presents CLI-Gym, the first scalable pipeline for deriving environment-intensive CLI tasks using environment history inversion guided by execution feedback.
Findings
Generated 1,655 CLI tasks, the largest of its kind.
Fine-tuned LiberCoder improves accuracy by +21.1% on Terminal-Bench.
First public pipeline for scalable CLI task derivation.
Abstract
Agentic coding requires agents to effectively interact with runtime environments, e.g., command line interfaces (CLI), so as to complete tasks like resolving dependency issues, fixing system problems, etc. But it remains underexplored how such environment-intensive tasks can be obtained at scale to enhance agents' capabilities. To address this, based on an analogy between the Dockerfile and the agentic task, we propose to employ agents to simulate and explore environment histories, guided by execution feedback. By tracing histories of a healthy environment, its state can be inverted to an earlier one with runtime failures, from which a task can be derived by packing the buggy state and the corresponding error messages. With our method, named CLI-Gym, a total of 1,655 environment-intensive tasks are derived, being the largest collection of its kind. Moreover, with curated successful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbedded Systems Design Techniques · Formal Methods in Verification · Advanced Software Engineering Methodologies
