NeoQA: Evidence-based Question Answering with Generated News Events
Max Glockner, Xiang Jiang, Leonardo F. R. Ribeiro, Iryna Gurevych, Markus Dreyer

TL;DR
NeoQA is a novel benchmark for evaluating evidence-based question answering in large language models, using fictional news events to ensure models rely solely on retrieved evidence rather than pretraining knowledge.
Contribution
The paper introduces NeoQA, a new dataset and benchmark that enables controlled, evidence-based evaluation of LLMs using generated fictional news data to prevent pretraining knowledge influence.
Findings
LLMs struggle with subtle mismatches between questions and evidence.
Models exhibit shortcut reasoning when key evidence is missing.
NeoQA provides a controlled environment for evaluating evidence reliance.
Abstract
Evaluating Retrieval-Augmented Generation (RAG) in large language models (LLMs) is challenging because benchmarks can quickly become stale. Questions initially requiring retrieval may become answerable from pretraining knowledge as newer models incorporate more recent information during pretraining, making it difficult to distinguish evidence-based reasoning from recall. We introduce NeoQA (News Events for Out-of-training Question Answering), a benchmark designed to address this issue. To construct NeoQA, we generated timelines and knowledge bases of fictional news events and entities along with news articles and Q\&A pairs to prevent LLMs from leveraging pretraining knowledge, ensuring that no prior evidence exists in their training data. We propose our dataset as a new platform for evaluating evidence-based question answering, as it requires LLMs to generate responses exclusively from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Computational and Text Analysis Methods
