Measuring Opinion Bias and Sycophancy via LLM-based Persuasion
Rodrigo Nogueira, Giovana Kerche Bon\'as, Thales Sales Almeida, Andrea Roque, Ramon Pires, Hugo Abonizio, Thiago Laitz, Celio Larcher, Roseval Malaquias Junior, Marcos Piau

TL;DR
This paper introduces an open-source method to measure the opinions and biases of large language models on contested topics through multi-turn interactions, revealing tendencies like sycophancy and mirroring.
Contribution
It presents a novel probing technique combining direct and indirect methods to uncover LLMs' true opinions and biases during realistic multi-turn dialogues.
Findings
Argumentative debate increases sycophancy 2-3x compared to direct questions.
Models often mirror opinions under sustained arguments.
Attacker capability influences dislodging existing opinions more than neutral starting points.
Abstract
Large language models increasingly shape the information people consume: they are embedded in search, consulted for professional advice, deployed as agents, and used as a first stop for questions about policy, ethics, health, and politics. When such a model silently holds a position on a contested topic, that position propagates at scale into users' decisions. Eliciting a model's positions is harder than it first appears: contemporary assistants answer direct opinion questions with evasive disclaimers, and the same model may concede the opposite position once the user starts arguing one side. We propose a method, released as the open-source llm-bias-bench, for discovering the opinions an LLM actually holds on contested topics under conditions that resemble real multi-turn interaction. The method pairs two complementary free-form probes. Direct probing asks for the model's opinion across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
