SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations
Zhaorun Chen, Francesco Pinto, Minzhou Pan, Bo Li

TL;DR
SafeWatch is an efficient multi-label video guardrail model that follows safety policies with content-specific explanations, outperforming existing methods in accuracy and efficiency while providing comprehensive safety coverage.
Contribution
We introduce SafeWatch, a novel parallel encoding and token pruning approach for multi-label video safety guardrails with explainability, and a large-scale benchmark SafeWatch-Bench.
Findings
Outperforms SOTA by 28.2% on SafeWatch-Bench
Reduces computational costs by 10%
Provides validated, content-specific safety explanations
Abstract
With the rise of generative AI and rapid growth of high-quality video generation, video guardrails have become more crucial than ever to ensure safety and security across platforms. Current video guardrails, however, are either overly simplistic, relying on pure classification models trained on simple policies with limited unsafe categories, which lack detailed explanations, or prompting multimodal large language models (MLLMs) with long safety guidelines, which are inefficient and impractical for guardrailing real-world content. To bridge this gap, we propose SafeWatch, an efficient MLLM-based video guardrail model designed to follow customized safety policies and provide multi-label video guardrail outputs with content-specific explanations in a zero-shot manner. In particular, unlike traditional MLLM-based guardrails that encode all safety policies autoregressively, causing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTransportation Safety and Impact Analysis · Vehicular Ad Hoc Networks (VANETs)
MethodsPruning
