Sycophancy in Vision-Language Models: A Systematic Analysis and an Inference-Time Mitigation Framework
Yunpu Zhao, Rui Zhang, Junbin Xiao, Changxin Ke, Ruibo Hou, Yifan Hao, Ling Li

TL;DR
This paper systematically analyzes sycophancy in large vision-language models, revealing its impact on biased outputs, and proposes an inference-time mitigation framework that effectively reduces sycophantic bias without retraining.
Contribution
It introduces a novel, training-free, inference-time framework to mitigate sycophancy in LVLMs, including query neutralization and contrastive decoding techniques.
Findings
Framework reduces sycophantic bias across models
Maintains performance on neutral prompts
Uncovers model-specific sycophancy behaviors
Abstract
Large Vision-Language Models (LVLMs) have shown significant capability in vision-language understanding. However, one critical issue that persists in these models is sycophancy, where models are unduly influenced by leading or deceptive prompts, resulting in biased outputs and hallucinations. Despite the rapid development of LVLMs, evaluating and mitigating sycophancy remains largely under-explored. In this work, we fill this gap by systematically analyzing sycophancy across multiple vision-language benchmarks and propose an inference-time mitigation framework. We curate leading queries and quantify the susceptibility of state-of-the-art LVLMs to prompt-induced bias, revealing consistent performance degradation and instability across models and tasks. Our analysis further uncovers model-specific behavioral traits, such as sentiment sensitivity and prediction polarity shifts under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
