SLMQuant:Benchmarking Small Language Model Quantization for Practical Deployment

Jiacheng Wang; Yejun Zeng; Jinyang Guo; Yuqing Ma; Aishan Liu; Xianglong Liu

arXiv:2511.13023·cs.LG·November 18, 2025

SLMQuant:Benchmarking Small Language Model Quantization for Practical Deployment

Jiacheng Wang, Yejun Zeng, Jinyang Guo, Yuqing Ma, Aishan Liu, Xianglong Liu

PDF

Open Access

TL;DR

This paper introduces SLMQuant, a benchmark for evaluating quantization techniques on Small Language Models, revealing unique challenges and proposing tailored solutions for efficient deployment on edge devices.

Contribution

It systematically evaluates quantization methods on SLMs, highlighting differences from LLMs and proposing design principles for effective SLM compression.

Findings

01

SLMs exhibit different quantization sensitivities than LLMs.

02

Direct transfer of LLM quantization techniques to SLMs is often suboptimal.

03

Key factors influencing SLM quantization effectiveness are identified.

Abstract

Despite the growing interest in Small Language Models (SLMs) as resource-efficient alternatives to Large Language Models (LLMs), their deployment on edge devices remains challenging due to unresolved efficiency gaps in model compression. While quantization has proven effective for LLMs, its applicability to SLMs is significantly underexplored, with critical questions about differing quantization bottlenecks and efficiency profiles. This paper introduces SLMQuant, the first systematic benchmark for evaluating LLM compression techniques when applied to SLMs. Through comprehensive multi-track evaluations across diverse architectures and tasks, we analyze how state-of-the-art quantization methods perform on SLMs. Our findings reveal fundamental disparities between SLMs and LLMs in quantization sensitivity, demonstrating that direct transfer of LLM-optimized techniques leads to suboptimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Natural Language Processing Techniques