M2S2L: Mamba-based Multi-Scale Spatial-temporal Learning for Video Anomaly Detection

Yang Liu; Boan Chen; Xiaoguang Zhu; Jing Liu; Peng Sun; Wei Zhou

arXiv:2511.05564·cs.CV·November 11, 2025

M2S2L: Mamba-based Multi-Scale Spatial-temporal Learning for Video Anomaly Detection

Yang Liu, Boan Chen, Xiaoguang Zhu, Jing Liu, Peng Sun, Wei Zhou

PDF

Open Access

TL;DR

This paper introduces M2S2L, a hierarchical multi-scale spatial-temporal learning framework for video anomaly detection that balances high accuracy with computational efficiency, suitable for real-time surveillance.

Contribution

It proposes a novel Mamba-based multi-scale spatial-temporal model with feature decomposition for improved behavioral modeling in video anomaly detection.

Findings

01

Achieves high frame-level AUCs on benchmark datasets.

02

Maintains real-time inference speed of 45 FPS.

03

Operates efficiently with 20.1G FLOPs.

Abstract

Video anomaly detection (VAD) is an essential task in the image processing community with prospects in video surveillance, which faces fundamental challenges in balancing detection accuracy with computational efficiency. As video content becomes increasingly complex with diverse behavioral patterns and contextual scenarios, traditional VAD approaches struggle to provide robust assessment for modern surveillance systems. Existing methods either lack comprehensive spatial-temporal modeling or require excessive computational resources for real-time applications. In this regard, we present a Mamba-based multi-scale spatial-temporal learning (M2S2L) framework in this paper. The proposed method employs hierarchical spatial encoders operating at multiple granularities and multi-temporal encoders capturing motion dynamics across different time scales. We also introduce a feature decomposition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition · Video Surveillance and Tracking Methods