Pioneer Agent: Continual Improvement of Small Language Models in Production

Dhruv Atreja; Julia White; Nikhil Nayak; Kelton Zhang; Henrijs Princis; George Hurn-Maloney; Ash Lewis; Urchade Zaratiana

arXiv:2604.09791·cs.AI·April 14, 2026

Pioneer Agent: Continual Improvement of Small Language Models in Production

Dhruv Atreja, Julia White, Nikhil Nayak, Kelton Zhang, Henrijs Princis, George Hurn-Maloney, Ash Lewis, Urchade Zaratiana

PDF

2 Models

TL;DR

Pioneer Agent is a closed-loop system that automates the continual improvement of small language models in production by optimizing data, training, and error diagnosis based on downstream feedback.

Contribution

It introduces Pioneer Agent, a system that automates data curation, model training, and error diagnosis for small language models in production environments.

Findings

01

Improves model performance by 1.6-83.8 points across various benchmarks.

02

Maintains performance in all AdaptFT-Bench scenarios, unlike naive retraining.

03

Enhances deployment metrics, e.g., intent classification from 84.9% to 99.3%.

Abstract

Small language models are attractive for production deployment due to their low cost, fast inference, and ease of specialization. However, adapting them to a specific task remains a challenging engineering loop, driven not by training itself but by surrounding decisions: data curation, failure diagnosis, regression avoidance, and iteration control. We present Pioneer Agent, a closed-loop system that automates this lifecycle. In cold-start mode, given only a natural-language task description, the agent acquires data, constructs evaluation sets, and iteratively trains models by jointly optimizing data, hyperparameters, and learning strategy. In production mode, given a deployed model with labeled failures, it diagnoses error patterns, constructs targeted training data, and retrains under explicit regression constraints. To evaluate this setting, we introduce AdaptFT-Bench, a benchmark of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.