Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms

Xinlin Wang; Mats Brorsson

arXiv:2604.19299·cs.CL·April 22, 2026

Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms

Xinlin Wang, Mats Brorsson

PDF

TL;DR

This paper investigates the deployment trade-offs of small language models under different agent paradigms, emphasizing agent-centric design for efficiency and trustworthiness in resource-limited settings.

Contribution

It provides the first large-scale analysis of sub-10B open-source models using base, single-agent, and multi-agent paradigms, revealing the effectiveness of agent-centric approaches.

Findings

01

Single-agent systems balance performance and cost effectively.

02

Multi-agent systems add overhead with limited performance gains.

03

Agent paradigms can compensate for small models' limitations.

Abstract

Despite the impressive capabilities of large language models, their substantial computational costs, latency, and privacy risks hinder their widespread deployment in real-world applications. Small Language Models (SLMs) with fewer than 10 billion parameters present a promising alternative; however, their inherent limitations in knowledge and reasoning curtail their effectiveness. Existing research primarily focuses on enhancing SLMs through scaling laws or fine-tuning strategies while overlooking the potential of using agent paradigms, such as tool use and multi-agent collaboration, to systematically compensate for the inherent weaknesses of small models. To address this gap, this paper presents the first large-scale, comprehensive study of <10B open-source models under three paradigms: (1) the base model, (2) a single agent equipped with tools, and (3) a multi-agent system with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.