Hearing is Believing? Evaluating and Analyzing Audio Language Model Sycophancy with SYAUDIO

Junchi Yao; Lokranjan Lakshmikanthan; Annie Zhao; Danielle Zhao; Shu Yang; Zikang Ding; Di Wang; Lijie Hu

arXiv:2601.23149·cs.SD·February 2, 2026

Hearing is Believing? Evaluating and Analyzing Audio Language Model Sycophancy with SYAUDIO

Junchi Yao, Lokranjan Lakshmikanthan, Annie Zhao, Danielle Zhao, Shu Yang, Zikang Ding, Di Wang, Lijie Hu

PDF

Open Access

TL;DR

This paper introduces SYAUDIO, a benchmark for evaluating sycophancy in Audio Language Models, revealing their tendencies to agree with user assertions and proposing fine-tuning as a mitigation strategy.

Contribution

The paper presents the first benchmark for assessing sycophancy in ALMs and analyzes how auditory factors influence this behavior, offering insights for mitigation.

Findings

01

SYAUDIO enables systematic evaluation of sycophancy in ALMs.

02

Supervised fine-tuning with chain-of-thought data reduces sycophantic behavior.

03

Audio-specific factors like noise and speech rate affect ALMs' tendency to agree with user assertions.

Abstract

Audio Language Models (ALMs) have recently shown strong capabilities in unified reasoning over speech, sound, and natural language; yet they inherit behavioral issues observed in Large Language Models, including sycophancy--the tendency to agree with user assertions even when they contradict objective evidence. While sycophancy has been extensively studied in text and vision-language models, its manifestation in audio-conditioned reasoning remains largely unexplored, despite the need for ALMs to rely on auditory cues such as acoustic events, speaker characteristics, and speech rate. To address this gap, we introduce SYAUDIO, the first benchmark dedicated to evaluating sycophancy in ALMs, consisting of 4,319 audio questions spanning Audio Perception, Audio Reasoning, Audio Math, and Audio Ethics. Built upon established audio benchmarks and augmented with TTS-generated arithmetic and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Speech Recognition and Synthesis