BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via   Compression

Yuankai Li; Jia-Chen Gu; Di Wu; Kai-Wei Chang; Nanyun Peng

arXiv:2410.15277·cs.CL·February 18, 2025

BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression

Yuankai Li, Jia-Chen Gu, Di Wu, Kai-Wei Chang, Nanyun Peng

PDF

Open Access 1 Repo 1 Video

TL;DR

BRIEF introduces a lightweight method that compresses retrieved documents into dense summaries to improve multi-hop reasoning efficiency and accuracy in retrieval-augmented generation, reducing latency and costs.

Contribution

The paper proposes BRIEF, a novel approach that uses synthetic data to learn compression of documents for enhanced multi-hop reasoning in LLMs, outperforming state-of-the-art baselines.

Findings

01

BRIEF doubles compression rate over baselines.

02

Achieves 3% higher EM and 4% higher F1 on HotpotQA.

03

Generates concise summaries comparable to GPT-3.5.

Abstract

Retrieval-augmented generation (RAG) can supplement large language models (LLMs) by integrating external knowledge. However, as the number of retrieved documents increases, the input length to LLMs grows linearly, causing a dramatic increase in latency and a degradation in long-context understanding. This is particularly serious for multi-hop questions that require a chain of reasoning across documents. To accelerate inference, reduce costs, and minimize distractions, this paper presents BRIEF (Bridging Retrieval and Inference through Evidence Fusion), a lightweight approach that performs query-aware multi-hop reasoning by compressing retrieved documents into highly dense textual summaries to integrate into in-context RAG. To enable learning compression for multi-hop reasoning, we curate synthetic data by extracting atomic propositions that encapsulate distinct factoids from the source…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JasonForJoy/BRIEF
pytorchOfficial

Videos

BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression· underline

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Attention Dropout · Softmax · Multi-Head Attention · {Dispute@FaQ-s}How to file a dispute with Expedia? · Linear Warmup With Cosine Annealing · Adam