TL;DR
Faithfulness-QA is a large dataset created through counterfactual entity substitution to improve and evaluate the context-faithfulness of Retrieval-Augmented Generation models.
Contribution
It introduces a novel dataset and construction pipeline for training and evaluating models on context-grounded answers using controlled knowledge conflicts.
Findings
The dataset contains 99,094 samples with high-quality filtering.
It enables training models to prefer context over internal knowledge.
The dataset and pipeline are publicly available for research use.
Abstract
Retrieval-Augmented Generation (RAG) models frequently produce answers grounded in parametric memory rather than the retrieved context, undermining the core promise of retrieval augmentation. A fundamental obstacle to fixing this unfaithfulness is the lack of training data that explicitly requires models to prefer context over internal knowledge. We introduce Faithfulness-QA, a large-scale dataset of 99,094 samples constructed through counterfactual entity substitution. Starting from two established extractive QA benchmarks--SQuAD and TriviaQA--we automatically identify answer-bearing named entities in each context, replace them with type-consistent alternatives drawn from a curated bank of 76,953 entities, and thereby manufacture controlled knowledge conflicts between context and parametric memory. Rigorous quality filtering ensures 100% pass rates across four automated checks on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
