TL;DR
This paper introduces PrefixMemory-Tuning, a novel approach that improves prefix-tuning by decoupling the prefix from the attention mechanism, leading to better performance on large language models.
Contribution
It proposes a new architecture that generalizes prefix-tuning, addressing its limitations and enhancing its expressiveness for state-of-the-art LLM adaptation.
Findings
PrefixMemory-Tuning outperforms existing prefix-tuning methods across benchmarks.
It achieves competitive results with modern PEFT techniques on several general benchmarks.
Decoupling the prefix from attention improves the effectiveness of prefix-tuning.
Abstract
Parameter-Efficient Fine-Tuning (PEFT) methods have become crucial for rapidly adapting large language models (LLMs) to downstream tasks. Prefix-Tuning, an early and effective PEFT technique, demonstrated the ability to achieve performance comparable to full fine-tuning with significantly reduced computational and memory overhead. However, despite its earlier success, its effectiveness in training modern state-of-the-art LLMs has been very limited. In this work, we demonstrate empirically that prefix-tuning underperforms on LLMs because of an inherent tradeoff between the contribution of the input prompt and the parameterized prefix within the attention head. This motivates us to introduce PrefixMemory-Tuning, an architecture that generalizes the principles of prefix-tuning while addressing its shortcomings by shifting the prefix module out of the attention head itself and improving its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
