AudioGuard: Toward Comprehensive Audio Safety Protection Across Diverse Threat Models

Mintong Kang; Chen Fang; Bo Li

arXiv:2604.08867·cs.SD·April 13, 2026

AudioGuard: Toward Comprehensive Audio Safety Protection Across Diverse Threat Models

Mintong Kang, Chen Fang, Bo Li

PDF

TL;DR

AudioGuard introduces a comprehensive audio safety framework with a new benchmark, addressing diverse threat models and improving detection accuracy with lower latency.

Contribution

The paper presents AudioSafetyBench, the first policy-based audio safety benchmark, and proposes AudioGuard, a unified guardrail for waveform and semantic protection.

Findings

01

AudioGuard outperforms strong baselines in accuracy.

02

AudioSafetyBench covers multiple languages and threat types.

03

AudioGuard achieves lower latency in detection.

Abstract

Audio has rapidly become a primary interface for foundation models, powering real-time voice assistants. Ensuring safety in audio systems is inherently more complex than just "unsafe text spoken aloud": real-world risks can hinge on audio-native harmful sound events, speaker attributes (e.g., child voice), impersonation/voice-cloning misuse, and voice-content compositional harms, such as child voice plus sexual content. The nature of audio makes it challenging to develop comprehensive benchmarks or guardrails against this unique risk landscape. To close this gap, we conduct large-scale red teaming on audio systems, systematically uncover vulnerabilities in audio, and develop a comprehensive, policy-grounded audio risk taxonomy and AudioSafetyBench, the first policy-based audio safety benchmark across diverse threat models. AudioSafetyBench supports diverse languages, suspicious voices…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.