Outlier-weighed Layerwise Sampling for LLM Fine-tuning

Pengxiang Li; Lu Yin; Xiaowei Gao; Shiwei Liu

arXiv:2405.18380·cs.LG·June 10, 2025

Outlier-weighed Layerwise Sampling for LLM Fine-tuning

Pengxiang Li, Lu Yin, Xiaowei Gao, Shiwei Liu

PDF

Open Access 2 Repos

TL;DR

The paper introduces Outlier-weighed Layerwise Sampling (OWS), a memory-efficient fine-tuning method for LLMs that selectively fine-tunes layers with more outliers, outperforming baseline methods in accuracy and memory usage.

Contribution

OWS is a novel fine-tuning approach that strategically samples layers based on outlier distribution, improving performance while reducing memory costs.

Findings

01

OWS outperforms baseline fine-tuning methods across multiple benchmarks.

02

OWS achieves up to 1.1% accuracy gain on Commonsense Reasoning.

03

OWS enables fine-tuning of 7B LLMs with only 21GB of memory.

Abstract

The rapid advancements in Large Language Models (LLMs) have revolutionized various natural language processing tasks. However, the substantial size of LLMs presents significant challenges in training or fine-tuning. While parameter-efficient approaches such as low-rank adaptation (LoRA) have gained popularity, they often compromise performance compared to full-rank fine-tuning. In this paper, we propose Outlier-weighed Layerwise Sampling (OWS), a new memory-efficient fine-tuning approach, inspired by the layerwise outlier distribution of LLMs. Unlike LoRA, which adds extra adapters to all layers, OWS strategically assigns higher sampling probabilities to layers with more outliers, selectively sampling only a few layers and fine-tuning their pre-trained weights. To further increase the number of fine-tuned layers without a proportional rise in memory costs, we incorporate gradient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Signal Denoising Methods · Speech and Audio Processing · Neural Networks and Applications