Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis

Hong Huang; Dapeng Wu

arXiv:2505.14742·cs.LG·June 2, 2025

Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis

Hong Huang, Dapeng Wu

PDF

Open Access 1 Repo

TL;DR

Quaff introduces a novel quantized fine-tuning method for large language models that efficiently suppresses activation outliers, achieving significant speed and memory improvements while maintaining or improving accuracy.

Contribution

The paper proposes the Outlier Spatial Stability Hypothesis and a quantized fine-tuning framework that dynamically suppresses outliers, reducing resource demands without sacrificing performance.

Findings

01

Achieves 1.73x latency reduction on GPQA benchmark.

02

Reduces memory usage by 30% compared to full-precision fine-tuning.

03

Improves accuracy by 0.6% on Phi-3 model.

Abstract

Large language models (LLMs) have made exciting achievements across various domains, yet their deployment on resource-constrained personal devices remains hindered by the prohibitive computational and memory demands of task-specific fine-tuning. While quantization offers a pathway to efficiency, existing methods struggle to balance performance and overhead, either incurring high computational/memory costs or failing to address activation outliers, a critical bottleneck in quantized fine-tuning. To address these challenges, we propose the Outlier Spatial Stability Hypothesis (OSSH): During fine-tuning, certain activation outlier channels retain stable spatial positions across training iterations. Building on OSSH, we propose Quaff, a Quantized parameter-efficient fine-tuning framework for LLMs, optimizing low-precision activation representations through targeted momentum scaling. Quaff…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

little0o0/quaff
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Neural Networks and Applications