Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms
Xinlin Wang, Mats Brorsson

TL;DR
This paper investigates the deployment trade-offs of small language models under different agent paradigms, emphasizing agent-centric design for efficiency and trustworthiness in resource-limited settings.
Contribution
It provides the first large-scale analysis of sub-10B open-source models using base, single-agent, and multi-agent paradigms, revealing the effectiveness of agent-centric approaches.
Findings
Single-agent systems balance performance and cost effectively.
Multi-agent systems add overhead with limited performance gains.
Agent paradigms can compensate for small models' limitations.
Abstract
Despite the impressive capabilities of large language models, their substantial computational costs, latency, and privacy risks hinder their widespread deployment in real-world applications. Small Language Models (SLMs) with fewer than 10 billion parameters present a promising alternative; however, their inherent limitations in knowledge and reasoning curtail their effectiveness. Existing research primarily focuses on enhancing SLMs through scaling laws or fine-tuning strategies while overlooking the potential of using agent paradigms, such as tool use and multi-agent collaboration, to systematically compensate for the inherent weaknesses of small models. To address this gap, this paper presents the first large-scale, comprehensive study of <10B open-source models under three paradigms: (1) the base model, (2) a single agent equipped with tools, and (3) a multi-agent system with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
