Anchor Attention, Small Cache: Code Generation with Large Language Models
Xiangyu Zhang, Yu Zhou, Guang Yang, Harald C. Gall, Taolue Chen

TL;DR
This paper introduces AnchorCoder, a novel attention mechanism for large language models that significantly reduces memory requirements during code generation while maintaining high performance, addressing computational and environmental concerns.
Contribution
It presents a new token-wise and layer-wise anchor attention approach that compresses contextual information and reduces KV cache needs by at least 70% in code generation models.
Findings
Achieves at least 70% reduction in KV cache requirements.
Maintains the majority of model performance.
Effective across multiple benchmark datasets.
Abstract
The development of large language models (LLMs) has revolutionized automated code generation. However, their high demand of computation resources has hindered a broader deployment and raised environmental concerns. A common strategy for diminishing computational demands is to cache Key-Value (KV) states from the attention mechanism which is adopted predominately by mainstream LLMs. It can mitigate the need of repeated attention computations, but brings significant memory overhead. Current practices in NLP often use sparse attention which may, unfortunately, lead to substantial inaccuracies, or hallucinations, in code generation tasks. In this paper, we analyze the attention weights distribution within code generation models via an empirical study, uncovering a sparsity pattern, i.e., the aggregation of information at specific anchor points. Based on this observation, we propose a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
