SLMQuant:Benchmarking Small Language Model Quantization for Practical Deployment
Jiacheng Wang, Yejun Zeng, Jinyang Guo, Yuqing Ma, Aishan Liu, Xianglong Liu

TL;DR
This paper introduces SLMQuant, a benchmark for evaluating quantization techniques on Small Language Models, revealing unique challenges and proposing tailored solutions for efficient deployment on edge devices.
Contribution
It systematically evaluates quantization methods on SLMs, highlighting differences from LLMs and proposing design principles for effective SLM compression.
Findings
SLMs exhibit different quantization sensitivities than LLMs.
Direct transfer of LLM quantization techniques to SLMs is often suboptimal.
Key factors influencing SLM quantization effectiveness are identified.
Abstract
Despite the growing interest in Small Language Models (SLMs) as resource-efficient alternatives to Large Language Models (LLMs), their deployment on edge devices remains challenging due to unresolved efficiency gaps in model compression. While quantization has proven effective for LLMs, its applicability to SLMs is significantly underexplored, with critical questions about differing quantization bottlenecks and efficiency profiles. This paper introduces SLMQuant, the first systematic benchmark for evaluating LLM compression techniques when applied to SLMs. Through comprehensive multi-track evaluations across diverse architectures and tasks, we analyze how state-of-the-art quantization methods perform on SLMs. Our findings reveal fundamental disparities between SLMs and LLMs in quantization sensitivity, demonstrating that direct transfer of LLM-optimized techniques leads to suboptimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Natural Language Processing Techniques
