Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis
Hong Huang, Dapeng Wu

TL;DR
Quaff introduces a novel quantized fine-tuning method for large language models that efficiently suppresses activation outliers, achieving significant speed and memory improvements while maintaining or improving accuracy.
Contribution
The paper proposes the Outlier Spatial Stability Hypothesis and a quantized fine-tuning framework that dynamically suppresses outliers, reducing resource demands without sacrificing performance.
Findings
Achieves 1.73x latency reduction on GPQA benchmark.
Reduces memory usage by 30% compared to full-precision fine-tuning.
Improves accuracy by 0.6% on Phi-3 model.
Abstract
Large language models (LLMs) have made exciting achievements across various domains, yet their deployment on resource-constrained personal devices remains hindered by the prohibitive computational and memory demands of task-specific fine-tuning. While quantization offers a pathway to efficiency, existing methods struggle to balance performance and overhead, either incurring high computational/memory costs or failing to address activation outliers, a critical bottleneck in quantized fine-tuning. To address these challenges, we propose the Outlier Spatial Stability Hypothesis (OSSH): During fine-tuning, certain activation outlier channels retain stable spatial positions across training iterations. Building on OSSH, we propose Quaff, a Quantized parameter-efficient fine-tuning framework for LLMs, optimizing low-precision activation representations through targeted momentum scaling. Quaff…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Neural Networks and Applications
