Echoes of Agreement: Argument Driven Opinion Shifts in Large Language Models
Avneet Kaur

TL;DR
This paper investigates how the presence of supporting or refuting arguments in prompts significantly influences large language models' responses, revealing a tendency to align with presented arguments and affecting bias evaluation.
Contribution
It demonstrates that prompt arguments can sway LLM responses, highlighting the importance of considering argument strength and context in bias assessments.
Findings
Arguments alter model responses towards the argument's direction
Stronger arguments increase the likelihood of response alignment
Models exhibit a sycophantic tendency to agree with presented arguments
Abstract
There have been numerous studies evaluating bias of LLMs towards political topics. However, how positions towards these topics in model outputs are highly sensitive to the prompt. What happens when the prompt itself is suggestive of certain arguments towards those positions remains underexplored. This is crucial for understanding how robust these bias evaluations are and for understanding model behaviour, as these models frequently interact with opinionated text. To that end, we conduct experiments for political bias evaluation in presence of supporting and refuting arguments. Our experiments show that such arguments substantially alter model responses towards the direction of the provided argument in both single-turn and multi-turn settings. Moreover, we find that the strength of these arguments influences the directional agreement rate of model responses. These effects point to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
