Dual Branch VideoMamba with Gated Class Token Fusion for Violence Detection

Damith Chamalke Senadeera; Xiaoyun Yang; Shibo Li; Muhammad Awais; Dimitrios Kollias; Gregory Slabaugh

arXiv:2506.03162·cs.CV·September 29, 2025

Dual Branch VideoMamba with Gated Class Token Fusion for Violence Detection

Damith Chamalke Senadeera, Xiaoyun Yang, Shibo Li, Muhammad Awais, Dimitrios Kollias, Gregory Slabaugh

PDF

TL;DR

This paper introduces Dual Branch VideoMamba with Gated Class Token Fusion, an efficient model combining spatial and temporal features for violence detection, achieving state-of-the-art results on a new comprehensive benchmark.

Contribution

It presents a novel dual-branch architecture with gated fusion and a new benchmark dataset for violence detection, improving accuracy and efficiency.

Findings

01

Achieves state-of-the-art performance on the new benchmark

02

Balances accuracy and computational efficiency effectively

03

Demonstrates the effectiveness of SSMs for real-time surveillance

Abstract

The rapid proliferation of surveillance cameras has increased the demand for automated violence detection. While CNNs and Transformers have shown success in extracting spatio-temporal features, they struggle with long-term dependencies and computational efficiency. We propose Dual Branch VideoMamba with Gated Class Token Fusion (GCTF), an efficient architecture combining a dual-branch design and a state-space model (SSM) backbone where one branch captures spatial features, while the other focuses on temporal dynamics. The model performs continuous fusion via a gating mechanism between the branches to enhance the model's ability to detect violent activities even in challenging surveillance scenarios. We also present a new benchmark by merging RWF-2000, RLVS, SURV and VioPeru datasets in video violence detection, ensuring strict separation between training and testing sets. Experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.