TL;DR
EdgeRazor is a novel lightweight framework that combines mixed-precision quantization and distillation techniques to efficiently compress large language models with minimal performance loss.
Contribution
It introduces a three-module framework for mixed-precision quantization-aware distillation, improving compression and efficiency of LLMs beyond existing methods.
Findings
Outperforms 2-bit baselines by 11.27% in accuracy.
Reduces storage from 1.11 GB to 0.19 GB at 1.58-bit precision.
Achieves 15.16× faster decoding over 16-bit baseline.
Abstract
Quantization has emerged as a mainstream approach for deploying Large Language Models (LLMs) on resource-constrained devices, yet compressing precision below 4-bit typically causes severe performance degradation or prohibitive retraining costs. In this paper, we propose EdgeRazor, a lightweight framework for LLMs via Mixed-Precision Quantization-Aware Distillation. It contains three modules: Structural Quantization with Mixed Precision for fine-grained control of bit-widths, Layer-Adaptive Feature Distillation that dynamically selects the most informative features for alignment, and Entropy-Aware KL Divergence for forward-reverse balance on both human-annotated and distilled datasets. Evaluations conducted on MobileLLM and Qwen families show that under weight-activation quantization, the 1.88-bit Qwen3-0.6B-EdgeRazor outperforms the state-of-the-art 2-bit baselines by 11.27 and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗zhangsq-nju/Qwen3-1.7B-EdgeRazor-GGUFmodel· 1.0k dl· ♡ 101.0k dl♡ 10
- 🤗zhangsq-nju/Qwen3-0.6B-EdgeRazor-4bitmodel· 72 dl· ♡ 672 dl♡ 6
- 🤗zhangsq-nju/Qwen3-0.6B-EdgeRazor-2.79bitmodel· 176 dl176 dl
- 🤗zhangsq-nju/Qwen3-0.6B-EdgeRazor-1.88bitmodel· 50 dl50 dl
- 🤗zhangsq-nju/Qwen3-0.6B-EdgeRazor-1.58bitmodel· 61 dl61 dl
- 🤗zhangsq-nju/Qwen3-1.7B-EdgeRazor-4bitmodel· 53 dl53 dl
- 🤗zhangsq-nju/Qwen3-1.7B-EdgeRazor-2.79bitmodel· 226 dl226 dl
- 🤗zhangsq-nju/Qwen3-1.7B-EdgeRazor-1.88bitmodel· 50 dl50 dl
- 🤗zhangsq-nju/Qwen3-1.7B-EdgeRazor-1.58bitmodel· 53 dl53 dl
- 🤗zhangsq-nju/MobileLLM-350M-EdgeRazor-4bitmodel· 38 dl38 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
