Say Less, Mean More: Leveraging Pragmatics in Retrieval-Augmented   Generation

Haris Riaz; Ellen Riloff; Mihai Surdeanu

arXiv:2502.17839·cs.CL·February 28, 2025

Say Less, Mean More: Leveraging Pragmatics in Retrieval-Augmented Generation

Haris Riaz, Ellen Riloff, Mihai Surdeanu

PDF

Open Access

TL;DR

This paper introduces an unsupervised method that applies pragmatic principles to improve retrieval-augmented generation by selecting and highlighting the most relevant context sentences, significantly boosting question answering accuracy.

Contribution

The paper presents a novel, unsupervised approach that enhances RAG systems by identifying and emphasizing relevant context sentences without altering the original content.

Findings

01

Up to 19.7% accuracy improvement on PubHealth

02

Up to 10% accuracy improvement on ARC-Challenge

03

Consistent performance gains across multiple LLMs and datasets

Abstract

We propose a simple, unsupervised method that injects pragmatic principles in retrieval-augmented generation (RAG) frameworks such as Dense Passage Retrieval to enhance the utility of retrieved contexts. Our approach first identifies which sentences in a pool of documents retrieved by RAG are most relevant to the question at hand, cover all the topics addressed in the input question and no more, and then highlights these sentences within their context, before they are provided to the LLM, without truncating or altering the context in any other way. We show that this simple idea brings consistent improvements in experiments on three question answering tasks (ARC-Challenge, PubHealth and PopQA) using five different LLMs. It notably enhances relative accuracy by up to 19.7% on PubHealth and 10% on ARC-Challenge compared to a conventional RAG system.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Information Retrieval and Search Behavior · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Weight Decay · Dense Connections · Attention Dropout · Linear Layer · Layer Normalization · Byte Pair Encoding · Residual Connection · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay