GateRA: Token-Aware Modulation for Parameter-Efficient Fine-Tuning
Jie Ou, Shuaihong Jiang, Yingjun Du, Cees G. M. Snoek

TL;DR
GateRA introduces token-aware modulation with adaptive gating and regularization to improve parameter-efficient fine-tuning of large models, focusing updates on challenging tokens and enhancing performance.
Contribution
It proposes a novel token-aware gating framework with entropy regularization for dynamic, selective fine-tuning, advancing beyond static PEFT methods.
Findings
GateRA outperforms prior PEFT methods on reasoning benchmarks.
Adaptive gating effectively suppresses redundant token updates.
Regularization leads to interpretable, sparse adaptation patterns.
Abstract
Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, DoRA, and HiRA, enable lightweight adaptation of large pre-trained models via low-rank updates. However, existing PEFT approaches apply static, input-agnostic updates to all tokens, disregarding the varying importance and difficulty of different inputs. This uniform treatment can lead to overfitting on trivial content or under-adaptation on more informative regions, especially in autoregressive settings with distinct prefill and decoding dynamics. In this paper, we propose GateRA, a unified framework that introduces token-aware modulation to dynamically adjust the strength of PEFT updates. By incorporating adaptive gating into standard PEFT branches, GateRA enables selective, token-level adaptation, preserving pre-trained knowledge for well-modeled inputs while focusing capacity on challenging cases. Empirical visualizations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
