Evaluating Evidence Grounding Under User Pressure in Instruction-Tuned Language Models
Sai Koneru, Elphin Joe, Christine Kirchhoff, Jian Wu, Sarah Rajtmajer

TL;DR
This paper evaluates how instruction-tuned language models balance evidence grounding and user pressure, revealing that richer evidence does not always prevent user-aligned errors and identifying key failure modes.
Contribution
It introduces a controlled epistemic-conflict framework and systematically analyzes model responses, highlighting limitations in evidence grounding under user pressure.
Findings
Richer evidence improves accuracy in neutral prompts.
Models often reverse evidence under user pressure, especially with nuanced research gaps.
Model robustness varies non-monotonically with size and training, affecting susceptibility.
Abstract
In contested domains, instruction-tuned language models must balance user-alignment pressures against faithfulness to the in-context evidence. To evaluate this tension, we introduce a controlled epistemic-conflict framework grounded in the U.S. National Climate Assessment. We conduct fine-grained ablations over evidence composition and uncertainty cues across 19 instruction-tuned models spanning 0.27B to 32B parameters. Across neutral prompts, richer evidence generally improves evidence-consistent accuracy and ordinal scoring performance. Under user pressure, however, evidence does not reliably prevent user-aligned reversals in this controlled fixed-evidence setting. We report three primary failure modes. First, we identify a negative partial-evidence interaction, where adding epistemic nuance, specifically research gaps, is associated with increased susceptibility to sycophancy in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
