Rational Synthesizers or Heuristic Followers? Analyzing LLMs in RAG-based Question-Answering
Atharv Naphade

TL;DR
This paper investigates how Large Language Models (LLMs) in Retrieval-Augmented Generation (RAG) systems process conflicting evidence, revealing they act as heuristic followers rather than rational synthesizers, with implications for system design.
Contribution
The study introduces GroupQA, a new dataset for analyzing evidence aggregation in LLMs, and provides insights into their behavior with conflicting evidence and explanation faithfulness.
Findings
Models favor first-presented evidence over last
Paraphrased arguments can be more persuasive than independent support
Larger models resist adapting to new evidence
Abstract
Retrieval-Augmented Generation (RAG) is the prevailing paradigm for grounding Large Language Models (LLMs), yet the mechanisms governing how models integrate groups of conflicting retrieved evidence remain opaque. Does an LLM answer a certain way because the evidence is factually strong, because of a prior belief, or merely because it is repeated frequently? To answer this, we introduce GroupQA, a curated dataset of 1,635 controversial questions paired with 15,058 diversely-sourced evidence documents, annotated for stance and qualitative strength. Through controlled experiments, we characterize group-level evidence aggregation dynamics: Paraphrasing an argument can be more persuasive than providing distinct independent support; Models favor evidence presented first rather than last, and Larger models are increasingly resistant to adapt to presented evidence. Additionally, we find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Expert finding and Q&A systems
