GEM: A Scale-Aware and Distribution-Sensitive Sparse Fine-Tuning Framework for Effective Downstream Adaptation

Sungmin Kang; Jisoo Kim; Salman Avestimehr; Sunwoo Lee

arXiv:2508.16191·cs.LG·August 25, 2025

GEM: A Scale-Aware and Distribution-Sensitive Sparse Fine-Tuning Framework for Effective Downstream Adaptation

Sungmin Kang, Jisoo Kim, Salman Avestimehr, Sunwoo Lee

PDF

1 Video

TL;DR

GEM is a novel sparse fine-tuning framework that adapts large models more effectively by considering parameter scales and entropy, leading to better performance with minimal parameter updates.

Contribution

GEM introduces a scale-aware, distribution-sensitive approach to sparse fine-tuning, improving adaptation efficiency and effectiveness over existing methods.

Findings

01

Achieves up to 1.6% accuracy improvement over full fine-tuning.

02

Updates only 0.1% of parameters, reducing computational cost.

03

Effective on both general and domain-specific tasks.

Abstract

Parameter-efficient fine-tuning (PEFT) has become a popular way to adapt large pre-trained models to new tasks. Most PEFT methods update only a small subset of parameters while freezing the rest, avoiding redundant computation. As they maximize the absolute size of the updates without regard to the parameters' original scale, the resulting changes in model behavior can be minimal. In contrast, we maximize updates relative to each parameter's scale, yielding more meaningful downstream adaptation. We propose Gradient-to-Weight Ratio and Entropy-guided Masking (GEM), a parameter scale-aware, distribution-sensitive sparse fine-tuning framework. GEM prioritizes parameters whose updates are significant in proportion to their initial pre-trained values. It also adaptively determines how many parameters to tune at each layer based on the entropy of parameter values, thereby making the most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

GEM: A Scale-Aware and Distribution-Sensitive Sparse Fine-Tuning Framework for Effective Downstream Adaptation· underline