TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents
Kaijie Zhu, Yuzhou Nie, Yijiang Li, Yiming Huang, Jialian Wu, Jiang Liu, Ximeng Sun, Zhenfei Yin, Lun Wang, Zicheng Liu, Emad Barsoum, William Yang Wang, Wenbo Guo

TL;DR
TermiGen introduces a comprehensive pipeline for creating high-fidelity, verifiable environments and resilient trajectories, significantly improving open-weight LLMs' ability to execute complex terminal tasks by reducing hallucinations and enhancing error recovery.
Contribution
It presents a novel end-to-end method for synthesizing environments and trajectories with error correction, leading to state-of-the-art performance on terminal task benchmarks.
Findings
Achieved 31.3% pass rate on TerminalBench with TermiGen-Qwen2.5-Coder-32B.
Outperformed existing baselines and proprietary models like o4-mini.
Generated diverse, verifiable environments and error-rich trajectories for training.
Abstract
Executing complex terminal tasks remains a significant challenge for open-weight LLMs, constrained by two fundamental limitations. First, high-fidelity, executable training environments are scarce: environments synthesized from real-world repositories are not diverse and scalable, while trajectories synthesized by LLMs suffer from hallucinations. Second, standard instruction tuning uses expert trajectories that rarely exhibit simple mistakes common to smaller models. This creates a distributional mismatch, leaving student models ill-equipped to recover from their own runtime failures. To bridge these gaps, we introduce TermiGen, an end-to-end pipeline for synthesizing verifiable environments and resilient expert trajectories. Termi-Gen first generates functionally valid tasks and Docker containers via an iterative multi-agent refinement loop. Subsequently, we employ a Generator-Critic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Multimodal Machine Learning Applications · Topic Modeling
