GateRA: Token-Aware Modulation for Parameter-Efficient Fine-Tuning

Jie Ou; Shuaihong Jiang; Yingjun Du; Cees G. M. Snoek

arXiv:2511.17582·cs.LG·December 23, 2025

GateRA: Token-Aware Modulation for Parameter-Efficient Fine-Tuning

Jie Ou, Shuaihong Jiang, Yingjun Du, Cees G. M. Snoek

PDF

Open Access 1 Video

TL;DR

GateRA introduces token-aware modulation with adaptive gating and regularization to improve parameter-efficient fine-tuning of large models, focusing updates on challenging tokens and enhancing performance.

Contribution

It proposes a novel token-aware gating framework with entropy regularization for dynamic, selective fine-tuning, advancing beyond static PEFT methods.

Findings

01

GateRA outperforms prior PEFT methods on reasoning benchmarks.

02

Adaptive gating effectively suppresses redundant token updates.

03

Regularization leads to interpretable, sparse adaptation patterns.

Abstract

Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, DoRA, and HiRA, enable lightweight adaptation of large pre-trained models via low-rank updates. However, existing PEFT approaches apply static, input-agnostic updates to all tokens, disregarding the varying importance and difficulty of different inputs. This uniform treatment can lead to overfitting on trivial content or under-adaptation on more informative regions, especially in autoregressive settings with distinct prefill and decoding dynamics. In this paper, we propose GateRA, a unified framework that introduces token-aware modulation to dynamically adjust the strength of PEFT updates. By incorporating adaptive gating into standard PEFT branches, GateRA enables selective, token-level adaptation, preserving pre-trained knowledge for well-modeled inputs while focusing capacity on challenging cases. Empirical visualizations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

GateRA: Token-aware Modulation for Parameter-Efficient Fine-tuning· underline

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications