TL;DR
This paper introduces a three-source interaction framework for LLMs, evaluates 27 models on their ability to balance internal knowledge with external user and document assertions, and shows fine-tuning improves discrimination of external info.
Contribution
It systematically studies how LLMs handle multiple information sources simultaneously and proposes fine-tuning to enhance source discrimination capabilities.
Findings
Most models favor document assertions over user assertions.
Post-training reinforces reliance on document assertions.
Fine-tuning improves models' ability to discriminate helpful versus harmful external information.
Abstract
Large language models (LLMs) often need to balance their internal parametric knowledge with external information, such as user beliefs and content from retrieved documents, in real-world scenarios like RAG or chat-based systems. A model's ability to reliably process these sources is key to system safety. Previous studies on knowledge conflict and sycophancy are limited to a binary conflict paradigm, primarily exploring conflicts between parametric knowledge and either a document or a user, but ignoring the interactive environment where all three sources exist simultaneously. To fill this gap, we propose a three-source interaction framework and systematically evaluate 27 LLMs from 3 families on 2 datasets. Our findings reveal general patterns: most models rely more on document assertions than user assertions, and this preference is reinforced by post-training. Furthermore, our behavioral…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
