Task-Specific Efficiency Analysis: When Small Language Models Outperform Large Language Models

Jinghan Cao; Yu Ma; Xinjin Li; Qingyang Ren; Xiangyun Chen

arXiv:2603.21389·cs.CL·March 24, 2026

Task-Specific Efficiency Analysis: When Small Language Models Outperform Large Language Models

Jinghan Cao, Yu Ma, Xinjin Li, Qingyang Ren, Xiangyun Chen

PDF

Open Access

TL;DR

This paper introduces a new metric called Performance-Efficiency Ratio (PER) to evaluate NLP models, revealing that small models often outperform large models in efficiency across various tasks, guiding resource-constrained deployment decisions.

Contribution

The paper presents the first comprehensive task-specific efficiency analysis comparing 16 models and introduces PER as a novel metric for evaluating model efficiency.

Findings

01

Small models (0.5--3B parameters) outperform large models in PER across tasks.

02

PER effectively balances accuracy, throughput, memory, and latency.

03

Results support deploying smaller models for resource-efficient NLP applications.

Abstract

Large Language Models achieve remarkable performance but incur substantial computational costs unsuitable for resource-constrained deployments. This paper presents the first comprehensive task-specific efficiency analysis comparing 16 language models across five diverse NLP tasks. We introduce the Performance-Efficiency Ratio (PER), a novel metric integrating accuracy, throughput, memory, and latency through geometric mean normalization. Our systematic evaluation reveals that small models (0.5--3B parameters) achieve superior PER scores across all given tasks. These findings establish quantitative foundations for deploying small models in production environments prioritizing inference efficiency over marginal accuracy gains.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Neural Network Applications