Cognitive-Aligned Document Selection for Retrieval-augmented Generation

Bingyu Wan; Fuxi Zhang; Zhongpeng Qi; Jiayi Ding; Jijun Li; Baoshi; Fan; Yijia Zhang; Jun Zhang

arXiv:2502.11770·cs.AI·February 18, 2025

Cognitive-Aligned Document Selection for Retrieval-augmented Generation

Bingyu Wan, Fuxi Zhang, Zhongpeng Qi, Jiayi Ding, Jijun Li, Baoshi, Fan, Yijia Zhang, Jun Zhang

PDF

Open Access

TL;DR

This paper introduces GGatrieval, a method that dynamically refines queries and filters documents in retrieval-augmented generation to improve the factual accuracy and verifiability of large language model outputs.

Contribution

The paper presents a novel grounded alignment and dynamic semantic compensation mechanism for better document retrieval in RAG systems, enhancing factual correctness.

Findings

01

Achieves state-of-the-art results on the ALCE benchmark.

02

Significantly improves the supportiveness of retrieved documents.

03

Enhances the factual accuracy of generated responses.

Abstract

Large language models (LLMs) inherently display hallucinations since the precision of generated texts cannot be guaranteed purely by the parametric knowledge they include. Although retrieval-augmented generation (RAG) systems enhance the accuracy and reliability of generative models by incorporating external documents, these retrieved documents often fail to adequately support the model's responses in practical applications. To address this issue, we propose GGatrieval (Fine-\textbf{G}rained \textbf{G}rounded \textbf{A}lignment Re\textbf{trieval} for verifiable generation), which leverages an LLM to dynamically update queries and filter high-quality, reliable retrieval documents. Specifically, we parse the user query into its syntactic components and perform fine-grained grounded alignment with the retrieved documents. For query components that cannot be individually aligned, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Information Retrieval and Search Behavior · Topic Modeling