Retrieval-augmented Decoding for Improving Truthfulness in Open-ended Generation

Manh Nguyen; Sunil Gupta; Hung Le

arXiv:2508.02184·cs.LG·March 17, 2026

Retrieval-augmented Decoding for Improving Truthfulness in Open-ended Generation

Manh Nguyen, Sunil Gupta, Hung Le

PDF

Open Access

TL;DR

This paper introduces Retrieval-Augmented Decoding (RAD), a lightweight, context-aware method that improves the truthfulness of large language models during inference by leveraging a small, annotated reference space for retrieval-based logit shaping.

Contribution

RAD is a novel decoding strategy that uses a small annotated grounding space to enhance factual accuracy without retraining or extensive fine-tuning of LLMs.

Findings

01

RAD outperforms strong baselines across four benchmarks.

02

RAD demonstrates robust generalization across different tasks.

03

RAD requires only a few annotated examples for effective grounding.

Abstract

Ensuring truthfulness in large language models (LLMs) remains a critical challenge for reliable text generation. While supervised fine-tuning and reinforcement learning with human feedback have shown promise, they require a substantial amount of annotated data and computational resources, limiting scalability. In contrast, decoding-time interventions offer lightweight alternatives without model retraining. However, existing decoding strategies often face issues like prompt sensitivity, limited generalization, or dependence on internal model states. We propose Retrieval-Augmented Decoding (RAD), a context-aware adaptive decoding method that leverages a compact reference grounding space built from as few as 10 annotated examples and comprising pairs of context embeddings and next-token logits from truthful responses, to enable retrieval-based logit shaping during inference. At each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Text Readability and Simplification