AudioGuard: Toward Comprehensive Audio Safety Protection Across Diverse Threat Models
Mintong Kang, Chen Fang, Bo Li

TL;DR
AudioGuard introduces a comprehensive audio safety framework with a new benchmark, addressing diverse threat models and improving detection accuracy with lower latency.
Contribution
The paper presents AudioSafetyBench, the first policy-based audio safety benchmark, and proposes AudioGuard, a unified guardrail for waveform and semantic protection.
Findings
AudioGuard outperforms strong baselines in accuracy.
AudioSafetyBench covers multiple languages and threat types.
AudioGuard achieves lower latency in detection.
Abstract
Audio has rapidly become a primary interface for foundation models, powering real-time voice assistants. Ensuring safety in audio systems is inherently more complex than just "unsafe text spoken aloud": real-world risks can hinge on audio-native harmful sound events, speaker attributes (e.g., child voice), impersonation/voice-cloning misuse, and voice-content compositional harms, such as child voice plus sexual content. The nature of audio makes it challenging to develop comprehensive benchmarks or guardrails against this unique risk landscape. To close this gap, we conduct large-scale red teaming on audio systems, systematically uncover vulnerabilities in audio, and develop a comprehensive, policy-grounded audio risk taxonomy and AudioSafetyBench, the first policy-based audio safety benchmark across diverse threat models. AudioSafetyBench supports diverse languages, suspicious voices…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
