Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?

Spandan Garg; Vikram Nitin; Yufan Huang

arXiv:2605.03195·cs.AI·May 6, 2026

Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?

Spandan Garg, Vikram Nitin, Yufan Huang

PDF

TL;DR

Terminus-4B is a finetuned small language model that can replace frontier models in agentic execution tasks, reducing token usage and maintaining performance.

Contribution

This paper introduces Terminus-4B, a finetuned small model that matches or exceeds frontier models in agentic terminal execution tasks.

Findings

01

Terminus-4B reduces token usage by up to 30% compared to baseline.

02

It maintains performance on benchmarks like SWE-Bench Pro and internal benchmarks.

03

Terminus-4B often surpasses frontier models like Claude Sonnet and GPT-5.3-Codex.

Abstract

Modern coding agents increasingly delegate specialized subtasks to subagents, which are smaller, focused agentic loops that handle narrow responsibilities like search, debugging or terminal execution. This architectural pattern keeps the main agent's context window clean by isolating verbose outputs (e.g. build logs, test results, etc.) within the subagent context. Typically when agents employ subagents for such tasks, they use frontier models as these subagents. In this paper, we investigate whether a finetuned small language model (SLM) can achieve comparable performance to frontier models in the task of agentic terminal execution. We present Terminus-4B, which is a post-trained Qwen3-4B model via Supervised Finetuning (SFT) and Reinforcement Learning (RL) using rubric-based LLM-as-judge reward, specifically for this task. In our extensive evaluation spanning various frontier models,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.