LongHeads: Multi-Head Attention is Secretly a Long Context Processor

Yi Lu; Xin Zhou; Wei He; Jun Zhao; Tao Ji; Tao Gui; Qi Zhang; Xuanjing; Huang

arXiv:2402.10685·cs.CL·March 26, 2024·1 cites

LongHeads: Multi-Head Attention is Secretly a Long Context Processor

Yi Lu, Xin Zhou, Wei He, Jun Zhao, Tao Ji, Tao Gui, Qi Zhang, Xuanjing, Huang

PDF

Open Access 1 Repo

TL;DR

LongHeads is a training-free framework that enhances large language models' ability to process long contexts by selectively attending to important chunks, enabling efficient long sequence processing without retraining.

Contribution

It introduces a novel chunk selection strategy that allows multi-head attention to effectively process longer contexts within trained length limits, improving long input handling.

Findings

01

Achieves 100% accuracy at 128k length on passkey retrieval task.

02

Works efficiently in linear time with relative positional encoding.

03

Enhances existing LLMs' long context processing without retraining.

Abstract

Large language models (LLMs) have achieved impressive performance in numerous domains but often struggle to process lengthy inputs effectively and efficiently due to limited length generalization and attention's quadratic computational demands. Many sought to mitigate this by restricting the attention window within the pre-trained length. However, these methods introduce new issues such as ignoring the middle context and requiring additional training. To address these problems, we propose LongHeads, a training-free framework that enhances LLM's long context ability by unlocking multi-head attention's untapped potential. Instead of allowing each head to attend to the full sentence, which struggles with generalizing to longer sequences due to out-of-distribution (OOD) issues, we allow each head to process in-distribution length by selecting and attending to important context chunks. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lululuyi/longheads
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition