LLaMA-Excitor: General Instruction Tuning via Indirect Feature   Interaction

Bo Zou; Chao Yang; Yu Qiao; Chengbin Quan; Youjian Zhao

arXiv:2404.00913·cs.CV·April 2, 2024·1 cites

LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction

Bo Zou, Chao Yang, Yu Qiao, Chengbin Quan, Youjian Zhao

PDF

Open Access

TL;DR

LLaMA-Excitor is a lightweight fine-tuning method that enhances instruction-following in LLMs and multi-modal models by selectively adjusting attention without altering hidden states, preserving core abilities while improving performance.

Contribution

The paper introduces LLaMA-Excitor, a novel attention bypass module that improves instruction-following and multi-modal tuning without compromising pre-trained knowledge.

Findings

01

Achieves +6% on MMLU benchmark.

02

Sets new state-of-the-art in image captioning with 157.5 CIDEr on MSCOCO.

03

Maintains core capabilities while enhancing task performance.

Abstract

Existing methods to fine-tune LLMs, like Adapter, Prefix-tuning, and LoRA, which introduce extra modules or additional input sequences to inject new skills or knowledge, may compromise the innate abilities of LLMs. In this paper, we propose LLaMA-Excitor, a lightweight method that stimulates the LLMs' potential to better follow instructions by gradually paying more attention to worthwhile information. Specifically, the LLaMA-Excitor does not directly change the intermediate hidden state during the self-attention calculation of the transformer structure. We designed the Excitor block as a bypass module for the similarity score computation in LLMs' self-attention to reconstruct keys and change the importance of values by learnable prompts. LLaMA-Excitor ensures a self-adaptive allocation of additional attention to input instructions, thus effectively preserving LLMs' pre-trained knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlind Source Separation Techniques · Neural Networks and Applications · Neural Networks and Reservoir Computing

MethodsAdapter