HearSay Benchmark: Do Audio LLMs Leak What They Hear?

Jin Wang; Liang Lin; Kaiwen Luo; Weiliu Wang; Yitian Chen; Moayad Aloqaily; Xuehai Tang; Zhenhong Zhou; Kun Wang; Li Sun; Qingsong Wen

arXiv:2601.03783·cs.CL·January 8, 2026

HearSay Benchmark: Do Audio LLMs Leak What They Hear?

Jin Wang, Liang Lin, Kaiwen Luo, Weiliu Wang, Yitian Chen, Moayad Aloqaily, Xuehai Tang, Zhenhong Zhou, Kun Wang, Li Sun, Qingsong Wen

PDF

Open Access

TL;DR

This paper introduces the HearSay benchmark to evaluate privacy risks in Audio Large Language Models, revealing significant privacy leakage, inadequate safety measures, and increased risks with reasoning capabilities.

Contribution

It presents the first comprehensive benchmark for privacy leakage in ALLMs, with rigorous data curation and extensive experiments demonstrating critical vulnerabilities.

Findings

01

ALLMs can accurately infer private attributes from voiceprints

02

Existing safety mechanisms are largely ineffective against privacy leaks

03

Reasoning techniques can amplify privacy risks in capable models

Abstract

While Audio Large Language Models (ALLMs) have achieved remarkable progress in understanding and generation, their potential privacy implications remain largely unexplored. This paper takes the first step to investigate whether ALLMs inadvertently leak user privacy solely through acoustic voiceprints and introduces $HearSay$ , a comprehensive benchmark constructed from over 22,000 real-world audio clips. To ensure data quality, the benchmark is meticulously curated through a rigorous pipeline involving automated profiling and human verification, guaranteeing that all privacy labels are grounded in factual records. Extensive experiments on $HearSay$ yield three critical findings: $Significant Privacy Leakage$ : ALLMs inherently extract private attributes from voiceprints, reaching 92.89% accuracy on gender and effectively profiling social attributes.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning