Context Embeddings for Efficient Answer Generation in RAG

David Rau; Shuai Wang; Herv\'e D\'ejean; St\'ephane Clinchant

arXiv:2407.09252·cs.CL·October 30, 2024·1 cites

Context Embeddings for Efficient Answer Generation in RAG

David Rau, Shuai Wang, Herv\'e D\'ejean, St\'ephane Clinchant

PDF

Open Access 4 Models

TL;DR

This paper introduces COCOM, a context compression technique for RAG that significantly speeds up answer generation by reducing long contexts to embeddings, with adjustable quality-speed trade-offs.

Contribution

COCOM is a novel context compression method that efficiently handles multiple contexts, improving decoding speed and answer quality over previous approaches.

Findings

01

Achieves up to 5.69× speed-up in decoding time.

02

Outperforms existing context compression methods in quality and efficiency.

03

Effectively manages multiple contexts for faster answer generation.

Abstract

Retrieval-Augmented Generation (RAG) allows overcoming the limited knowledge of LLMs by extending the input with external information. As a consequence, the contextual inputs to the model become much longer which slows down decoding time directly translating to the time a user has to wait for an answer. We address this challenge by presenting COCOM, an effective context compression method, reducing long contexts to only a handful of Context Embeddings speeding up the generation time by a large margin. Our method allows for different compression rates trading off decoding time for answer quality. Compared to earlier methods, COCOM allows for handling multiple contexts more effectively, significantly reducing decoding time for long inputs. Our method demonstrates a speed-up of up to 5.69 $\times$ while achieving higher performance compared to existing efficient context compression methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems