PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention

Haonan Wang; Brian Chen; Siquan Li; Xinhe Liang; Hwee Kuan Lee; Kenji Kawaguchi; Tianyang Hu

arXiv:2506.13674·cs.CL·April 21, 2026

PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention

Haonan Wang, Brian Chen, Siquan Li, Xinhe Liang, Hwee Kuan Lee, Kenji Kawaguchi, Tianyang Hu

PDF

1 Video

TL;DR

This paper introduces PrefixMemory-Tuning, a novel approach that improves prefix-tuning by decoupling the prefix from the attention mechanism, leading to better performance on large language models.

Contribution

It proposes a new architecture that generalizes prefix-tuning, addressing its limitations and enhancing its expressiveness for state-of-the-art LLM adaptation.

Findings

01

PrefixMemory-Tuning outperforms existing prefix-tuning methods across benchmarks.

02

It achieves competitive results with modern PEFT techniques on several general benchmarks.

03

Decoupling the prefix from attention improves the effectiveness of prefix-tuning.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) methods have become crucial for rapidly adapting large language models (LLMs) to downstream tasks. Prefix-Tuning, an early and effective PEFT technique, demonstrated the ability to achieve performance comparable to full fine-tuning with significantly reduced computational and memory overhead. However, despite its earlier success, its effectiveness in training modern state-of-the-art LLMs has been very limited. In this work, we demonstrate empirically that prefix-tuning underperforms on LLMs because of an inherent tradeoff between the contribution of the input prompt and the parameterized prefix within the attention head. This motivates us to introduce PrefixMemory-Tuning, an architecture that generalizes the principles of prefix-tuning while addressing its shortcomings by shifting the prefix module out of the attention head itself and improving its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention· slideslive