Correlation-Aware Select and Merge Attention for Efficient Fine-Tuning   and Context Length Extension

Ning Wang; Zekun Li; Tongxin Bai; Guoqi Li

arXiv:2410.04211·cs.CL·October 8, 2024

Correlation-Aware Select and Merge Attention for Efficient Fine-Tuning and Context Length Extension

Ning Wang, Zekun Li, Tongxin Bai, Guoqi Li

PDF

Open Access

TL;DR

This paper introduces a novel correlation-aware attention mechanism that significantly extends context lengths in large language models with reduced computational costs, enabling efficient fine-tuning and inference on ultra-long sequences.

Contribution

It proposes a flexible, resource-efficient attention architecture with correlation-aware selection and merging, along with a new positional encoding technique for better generalization to unseen positions.

Findings

01

Fine-tuning Llama2-7B with 32K sequence length using a single A100 GPU.

02

Extending context lengths up to 1 million tokens with high accuracy and stable perplexity.

03

Achieving at least 64-fold resource reduction compared to traditional attention methods.

Abstract

Modeling long sequences is crucial for various large-scale models; however, extending existing architectures to handle longer sequences presents significant technical and resource challenges. In this paper, we propose an efficient and flexible attention architecture that enables the extension of context lengths in large language models with reduced computational resources and fine-tuning time compared to other excellent methods. Specifically, we introduce correlation-aware selection and merging mechanisms to facilitate efficient sparse attention. In addition, we also propose a novel data augmentation technique involving positional encodings to enhance generalization to unseen positions. The results are as follows: First, using a single A100, we achieve fine-tuning on Llama2-7B with a sequence length of 32K, which is more efficient than other methods that rely on subsets for regression.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Advanced Image and Video Retrieval Techniques · Medical Image Segmentation Techniques

MethodsSoftmax · Attention Is All You Need · Neural Tangent Kernel