On Data Engineering for Scaling LLM Terminal Capabilities

Renjie Pi; Grace Lam; Mohammad Shoeybi; Pooya Jannaty; Bryan Catanzaro; Wei Ping

arXiv:2602.21193·cs.CL·February 25, 2026

On Data Engineering for Scaling LLM Terminal Capabilities

Renjie Pi, Grace Lam, Mohammad Shoeybi, Pooya Jannaty, Bryan Catanzaro, Wei Ping

PDF

Open Access 4 Models 5 Datasets

TL;DR

This paper systematically studies data engineering practices for large language model terminal capabilities, introduces a synthetic task pipeline, and trains models that significantly improve terminal task performance, opening resources for further research.

Contribution

It presents a novel synthetic task generation pipeline and a comprehensive analysis of data strategies, leading to improved terminal models and an open-source dataset.

Findings

01

Models trained on the new dataset outperform previous versions.

02

Scaling model size improves terminal task accuracy.

03

Open-sourcing datasets and checkpoints facilitates future research.

Abstract

Despite rapid recent progress in the terminal capabilities of large language models, the training data strategies behind state-of-the-art terminal agents remain largely undisclosed. We address this gap through a systematic study of data engineering practices for terminal agents, making two key contributions: (1) Terminal-Task-Gen, a lightweight synthetic task generation pipeline that supports seed-based and skill-based task construction, and (2) a comprehensive analysis of data and training strategies, including filtering, curriculum learning, long context training, and scaling behavior. Our pipeline yields Terminal-Corpus, a large-scale open-source dataset for terminal tasks. Using this dataset, we train Nemotron-Terminal, a family of models initialized from Qwen3(8B, 14B, 32B) that achieve substantial gains on Terminal-Bench 2.0: Nemotron-Terminal-8B improves from 2.5% to 13.0%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques