LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init   Attention

Renrui Zhang; Jiaming Han; Chris Liu; Peng Gao; Aojun Zhou; Xiangfei; Hu; Shilin Yan; Pan Lu; Hongsheng Li; Yu Qiao

arXiv:2303.16199·cs.CV·September 20, 2024·167 cites

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

Renrui Zhang, Jiaming Han, Chris Liu, Peng Gao, Aojun Zhou, Xiangfei, Hu, Shilin Yan, Pan Lu, Hongsheng Li, Yu Qiao

PDF

Open Access 5 Repos 9 Models

TL;DR

LLaMA-Adapter introduces a lightweight, efficient fine-tuning method for LLaMA that uses zero-initialized attention and minimal parameters, achieving high-quality instruction-following and multi-modal performance with low computational cost.

Contribution

The paper proposes a novel zero-initialized attention mechanism and a lightweight adaptation approach for efficient fine-tuning of large language models like LLaMA.

Findings

01

Achieves comparable performance to fully fine-tuned models with only 1.2M additional parameters.

02

Extends to multi-modal instruction learning, outperforming existing methods on benchmarks.

03

Demonstrates the generalization of zero-initialized attention to vision and language tasks.

Abstract

We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the word tokens at higher transformer layers. Then, a zero-initialized attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With our efficient training, LLaMA-Adapter can generate high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Besides language commands, our approach can be simply extended to multi-modal instructions for learning image-conditioned LLaMA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques