TL;DR
VoxSafeBench is a comprehensive benchmark that evaluates speech language models on safety, fairness, and privacy by considering who is speaking, how they sound, and the environment, revealing gaps in current model capabilities.
Contribution
It introduces a novel two-tier benchmark for assessing social alignment in speech models across multiple dimensions, highlighting the limitations of existing safeguards.
Findings
Safeguards effective on text often degrade in speech contexts.
Models detect acoustic cues but fail to act appropriately on them.
Current speech models struggle with speaker- and scene-conditioned risks.
Abstract
As speech language models (SLMs) transition from personal devices into shared, multi-user environments, their responses must account for far more than the words alone. Who is speaking, how they sound, and where the conversation takes place can each turn an otherwise benign request into one that is unsafe, unfair, or privacy-violating. Existing benchmarks, however, largely focus on basic audio comprehension, study individual risks in isolation, or conflate content that is inherently harmful with content that only becomes problematic due to its acoustic context. We introduce VoxSafeBench, among the first benchmarks to jointly evaluate social alignment in SLMs across three dimensions: safety, fairness, and privacy. VoxSafeBench adopts a Two-Tier design: Tier1 evaluates content-centric risks using matched text and audio inputs, while Tier2 targets audio-conditioned risks in which the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
