Anchor Attention, Small Cache: Code Generation with Large Language   Models

Xiangyu Zhang; Yu Zhou; Guang Yang; Harald C. Gall; Taolue Chen

arXiv:2411.06680·cs.SE·November 12, 2024

Anchor Attention, Small Cache: Code Generation with Large Language Models

Xiangyu Zhang, Yu Zhou, Guang Yang, Harald C. Gall, Taolue Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces AnchorCoder, a novel attention mechanism for large language models that significantly reduces memory requirements during code generation while maintaining high performance, addressing computational and environmental concerns.

Contribution

It presents a new token-wise and layer-wise anchor attention approach that compresses contextual information and reduces KV cache needs by at least 70% in code generation models.

Findings

01

Achieves at least 70% reduction in KV cache requirements.

02

Maintains the majority of model performance.

03

Effective across multiple benchmark datasets.

Abstract

The development of large language models (LLMs) has revolutionized automated code generation. However, their high demand of computation resources has hindered a broader deployment and raised environmental concerns. A common strategy for diminishing computational demands is to cache Key-Value (KV) states from the attention mechanism which is adopted predominately by mainstream LLMs. It can mitigate the need of repeated attention computations, but brings significant memory overhead. Current practices in NLP often use sparse attention which may, unfortunately, lead to substantial inaccuracies, or hallucinations, in code generation tasks. In this paper, we analyze the attention weights distribution within code generation models via an empirical study, uncovering a sparsity pattern, i.e., the aggregation of information at specific anchor points. Based on this observation, we propose a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

NUAAZXY/Anchor_Coder
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems