TL;DR
GEM is a novel sparse fine-tuning framework that adapts large models more effectively by considering parameter scales and entropy, leading to better performance with minimal parameter updates.
Contribution
GEM introduces a scale-aware, distribution-sensitive approach to sparse fine-tuning, improving adaptation efficiency and effectiveness over existing methods.
Findings
Achieves up to 1.6% accuracy improvement over full fine-tuning.
Updates only 0.1% of parameters, reducing computational cost.
Effective on both general and domain-specific tasks.
Abstract
Parameter-efficient fine-tuning (PEFT) has become a popular way to adapt large pre-trained models to new tasks. Most PEFT methods update only a small subset of parameters while freezing the rest, avoiding redundant computation. As they maximize the absolute size of the updates without regard to the parameters' original scale, the resulting changes in model behavior can be minimal. In contrast, we maximize updates relative to each parameter's scale, yielding more meaningful downstream adaptation. We propose Gradient-to-Weight Ratio and Entropy-guided Masking (GEM), a parameter scale-aware, distribution-sensitive sparse fine-tuning framework. GEM prioritizes parameters whose updates are significant in proportion to their initial pre-trained values. It also adaptively determines how many parameters to tune at each layer based on the entropy of parameter values, thereby making the most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
