Task-Specific Efficiency Analysis: When Small Language Models Outperform Large Language Models
Jinghan Cao, Yu Ma, Xinjin Li, Qingyang Ren, Xiangyun Chen

TL;DR
This paper introduces a new metric called Performance-Efficiency Ratio (PER) to evaluate NLP models, revealing that small models often outperform large models in efficiency across various tasks, guiding resource-constrained deployment decisions.
Contribution
The paper presents the first comprehensive task-specific efficiency analysis comparing 16 models and introduces PER as a novel metric for evaluating model efficiency.
Findings
Small models (0.5--3B parameters) outperform large models in PER across tasks.
PER effectively balances accuracy, throughput, memory, and latency.
Results support deploying smaller models for resource-efficient NLP applications.
Abstract
Large Language Models achieve remarkable performance but incur substantial computational costs unsuitable for resource-constrained deployments. This paper presents the first comprehensive task-specific efficiency analysis comparing 16 language models across five diverse NLP tasks. We introduce the Performance-Efficiency Ratio (PER), a novel metric integrating accuracy, throughput, memory, and latency through geometric mean normalization. Our systematic evaluation reveals that small models (0.5--3B parameters) achieve superior PER scores across all given tasks. These findings establish quantitative foundations for deploying small models in production environments prioritizing inference efficiency over marginal accuracy gains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Neural Network Applications
