DeepSVU: Towards In-depth Security-oriented Video Understanding via Unified Physical-world Regularized MoE

Yujie Jin; Wenxin Zhang; Jingjing Wang; Guodong Zhou

arXiv:2602.18019·cs.CV·February 23, 2026

DeepSVU: Towards In-depth Security-oriented Video Understanding via Unified Physical-world Regularized MoE

Yujie Jin, Wenxin Zhang, Jingjing Wang, Guodong Zhou

PDF

Open Access

TL;DR

This paper introduces DeepSVU, a novel security-oriented video understanding task that not only detects threats but also attributes and evaluates their causes, utilizing a unified physical-world regularized MoE approach.

Contribution

It proposes a new DeepSVU task and a UPRM model that effectively models physical-world information for in-depth threat understanding in videos.

Findings

01

UPRM outperforms existing methods on DeepSVU datasets.

02

Physical-world information significantly improves threat attribution.

03

The approach effectively models coarse-to-fine physical-world cues.

Abstract

In the literature, prior research on Security-oriented Video Understanding (SVU) has predominantly focused on detecting and localize the threats (e.g., shootings, robberies) in videos, while largely lacking the effective capability to generate and evaluate the threat causes. Motivated by these gaps, this paper introduces a new chat paradigm SVU task, i.e., In-depth Security-oriented Video Understanding (DeepSVU), which aims to not only identify and locate the threats but also attribute and evaluate the causes threatening segments. Furthermore, this paper reveals two key challenges in the proposed task: 1) how to effectively model the coarse-to-fine physical-world information (e.g., human behavior, object interactions and background context) to boost the DeepSVU task; and 2) how to adaptively trade off these factors. To tackle these challenges, this paper proposes a new Unified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning