LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
Bo Zou, Chao Yang, Yu Qiao, Chengbin Quan, Youjian Zhao

TL;DR
LLaMA-Excitor is a lightweight fine-tuning method that enhances instruction-following in LLMs and multi-modal models by selectively adjusting attention without altering hidden states, preserving core abilities while improving performance.
Contribution
The paper introduces LLaMA-Excitor, a novel attention bypass module that improves instruction-following and multi-modal tuning without compromising pre-trained knowledge.
Findings
Achieves +6% on MMLU benchmark.
Sets new state-of-the-art in image captioning with 157.5 CIDEr on MSCOCO.
Maintains core capabilities while enhancing task performance.
Abstract
Existing methods to fine-tune LLMs, like Adapter, Prefix-tuning, and LoRA, which introduce extra modules or additional input sequences to inject new skills or knowledge, may compromise the innate abilities of LLMs. In this paper, we propose LLaMA-Excitor, a lightweight method that stimulates the LLMs' potential to better follow instructions by gradually paying more attention to worthwhile information. Specifically, the LLaMA-Excitor does not directly change the intermediate hidden state during the self-attention calculation of the transformer structure. We designed the Excitor block as a bypass module for the similarity score computation in LLMs' self-attention to reconstruct keys and change the importance of values by learnable prompts. LLaMA-Excitor ensures a self-adaptive allocation of additional attention to input instructions, thus effectively preserving LLMs' pre-trained knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlind Source Separation Techniques · Neural Networks and Applications · Neural Networks and Reservoir Computing
MethodsAdapter
