TS-PEFT: Unveiling Token-Level Redundancy in Parameter-Efficient Fine-Tuning
Dabiao Ma, Ziming Dai, Zhimin Xin, Shu Wang, Jian Yang, Haojun Fei

TL;DR
TS-PEFT introduces a framework that identifies and discards redundant token updates during fine-tuning, reducing computation while maintaining or improving performance compared to traditional dense methods.
Contribution
The paper reveals pervasive token-level redundancy in PEFT and proposes TS-PEFT, a novel method that dynamically identifies and prunes redundant token updates during fine-tuning.
Findings
Discarding 30%-70% of token updates maintains or improves performance.
Token-level sparsity outperforms weight-based importance criteria.
TS-PEFT reduces computational cost without sacrificing accuracy.
Abstract
Current Parameter-Efficient Fine-Tuning (PEFT) methods typically operate under an implicit assumption: Once a target module is selected, every token passing through it contributes equally to the downstream task and requires a parameter update. In this paper, we challenge this convention by revealing a pervasive token-level redundancy in the fine-tuning of large models (LMs). We propose TS-PEFT, a theoretical framework utilizing proximal optimization that acts as a dynamic probe to identify token-level redundancy during the fine-tuning process. Extensive experiments demonstrate that indiscriminately updating all tokens is not only computationally superfluous but often introduces optimization noise. Surprisingly, by discarding 30%-70% of token updates, TS-PEFT consistently matches or exceeds the performance of dense baselines such as LoRA, DoRA. Our in-depth analysis shows that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
