PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances   Retrieval-Augmented Generation with Zero Inference Overhead

Tao Tan; Yining Qian; Ang Lv; Hongzhan Lin; Songhao Wu; Yongbo Wang,; Feng Wang; Jingtong Wu; Xin Lu; Rui Yan

arXiv:2409.19745·cs.CL·October 8, 2024

PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances Retrieval-Augmented Generation with Zero Inference Overhead

Tao Tan, Yining Qian, Ang Lv, Hongzhan Lin, Songhao Wu, Yongbo Wang,, Feng Wang, Jingtong Wu, Xin Lu, Rui Yan

PDF

Open Access

TL;DR

PEAR improves retrieval-augmented generation by re-weighting attention heads to enhance context awareness without increasing inference time or memory, applicable across different position embeddings.

Contribution

PEAR introduces a position-embedding-agnostic method that re-weights attention heads to boost context awareness in LLMs during RAG tasks with zero inference overhead.

Findings

01

Outperforms baselines in accuracy and efficiency on RAG tasks

02

Reduces suppression by certain attention heads, improving context copying

03

Works independently of position embedding algorithms

Abstract

Large language models (LLMs) enhanced with retrieval-augmented generation (RAG) have introduced a new paradigm for web search. However, the limited context awareness of LLMs degrades their performance on RAG tasks. Existing methods to enhance context awareness are often inefficient, incurring time or memory overhead during inference, and many are tailored to specific position embeddings. In this paper, we propose Position-Embedding-Agnostic attention Re-weighting (PEAR), which enhances the context awareness of LLMs with zero inference overhead. Specifically, on a proxy task focused on context copying, we first detect heads which suppress the models' context awareness thereby diminishing RAG performance. To weaken the impact of these heads, we re-weight their outputs with learnable coefficients. The LLM (with frozen parameters) is optimized by adjusting these coefficients to minimize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Attention Dropout · WordPiece · Linear Warmup With Linear Decay · Linear Layer · Weight Decay · Byte Pair Encoding · BERT · Softmax · Dropout