BASIL: Bayesian Assessment of Sycophancy in LLMs
Katherine Atwell, Pedram Heydari, Anthony Sicilia, Malihe Alikhani

TL;DR
This paper introduces a Bayesian framework to measure and analyze sycophantic behavior in large language models, distinguishing it from rational belief updates and improving model calibration.
Contribution
It presents a novel probabilistic approach to quantify sycophancy, separating it from rational responses, and demonstrates methods to reduce Bayesian inconsistency in LLMs.
Findings
Sycophantic belief shifts are prevalent across multiple LLMs.
Models' rationality is affected by over- or under-updating beliefs.
Calibration and fine-tuning reduce Bayesian inconsistency significantly.
Abstract
Sycophancy (overly agreeable or flattering behavior) poses a fundamental challenge for human-AI collaboration, particularly in high-stakes decision-making domains such as health, law, and education. A central difficulty in studying sycophancy in large language models (LLMs) is disentangling sycophantic belief shifts from rational changes in behavior driven by new evidence or user-provided information. Existing approaches either measure descriptive behavior changes or apply normative evaluations that rely on objective ground truth, limiting their applicability to subjective or uncertain tasks. We introduce a Bayesian probabilistic framework, grounded in behavioral economics and rational decision theory, that explicitly separates sycophancy from rational belief updating. Within this framework, we achieve three objectives: (i) a descriptive metric that measures sycophancy while controlling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
