Endless Terminals: Scaling RL Environments for Terminal Agents
Kanishk Gandhi, Shivam Garg, Noah D. Goodman, Dimitris Papailiopoulos

TL;DR
This paper introduces Endless Terminals, an autonomous pipeline for generating diverse terminal-use tasks to train reinforcement learning agents, resulting in significant performance improvements on various benchmarks with simple RL methods.
Contribution
The paper presents a scalable, automated environment generation pipeline for RL training, enabling effective learning with minimal interaction and simple algorithms.
Findings
Agents trained on Endless Terminals outperform baselines on held-out benchmarks.
Models show substantial gains in task success rates after training.
Simple RL methods benefit greatly from the scalable environment pipeline.
Abstract
Environments are the bottleneck for self-improving agents. Current terminal benchmarks were built for evaluation, not training; reinforcement learning requires a scalable pipeline, not just a dataset. We introduce Endless Terminals, a fully autonomous pipeline that procedurally generates terminal-use tasks without human annotation. The pipeline has four stages: generating diverse task descriptions, building and validating containerized environments, producing completion tests, and filtering for solvability. From this pipeline we obtain 3255 tasks spanning file operations, log management, data processing, scripting, and database operations. We train agents using vanilla PPO with binary episode level rewards and a minimal interaction loop: no retrieval, multi-agent coordination, or specialized tools. Despite this simplicity, models trained on Endless Terminals show substantial gains: on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)
