Protecting Bystander Privacy via Selective Hearing in Audio LLMs
Xiao Zhan, Guangzhi Sun, Jose Such, Phil Woodland

TL;DR
This paper introduces SH-Bench, a benchmark for evaluating selective hearing in audio LLMs to protect bystander privacy, and proposes BPFT, a training method to improve privacy without sacrificing comprehension.
Contribution
It presents the first benchmark and metric for assessing bystander privacy in audio LLMs, along with a novel fine-tuning approach to enhance privacy protection.
Findings
State-of-the-art models leak significant bystander information.
BPFT improves bystander privacy accuracy by 47%.
SH-Bench enables systematic evaluation of privacy in audio LLMs.
Abstract
Audio Large language models (LLMs) are increasingly deployed in the real world, where they inevitably capture speech from unintended nearby bystanders, raising privacy risks that existing benchmarks and defences did not consider. We introduce SH-Bench, the first benchmark designed to evaluate selective hearing: a model's ability to attend to an intended main speaker while refusing to process or reveal information about incidental bystander speech. SH-Bench contains 3,968 multi-speaker audio mixtures, including both real-world and synthetic scenarios, paired with 77k multiple-choice questions that probe models under general and selective operating modes. In addition, we propose Selective Efficacy (SE), a novel metric capturing both multi-speaker comprehension and bystander-privacy protection. Our evaluation of state-of-the-art open-source and proprietary LLMs reveals substantial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
