ESOM: Efficiently Understanding Streaming Video Anomalies with Open-world Dynamic Definitions

Zihao Liu; Xiaoyu Wu; Wenna Li; Jianqin Wu; Linlin Yang

arXiv:2604.07772·cs.CV·April 10, 2026

ESOM: Efficiently Understanding Streaming Video Anomalies with Open-world Dynamic Definitions

Zihao Liu, Xiaoyu Wu, Wenna Li, Jianqin Wu, Linlin Yang

PDF

1 Repo

TL;DR

ESOM is a novel, training-free streaming video anomaly detection model that efficiently handles dynamic definitions and provides real-time performance with state-of-the-art results.

Contribution

The paper introduces ESOM, a training-free, efficient streaming OWVAD model with modules for normalization, token merging, memory, and scoring, plus a new benchmark dataset.

Findings

01

Achieves real-time efficiency on a single GPU.

02

Outperforms existing methods in anomaly localization and classification.

03

Provides accurate anomaly description generation.

Abstract

Open-world video anomaly detection (OWVAD) aims to detect and explain abnormal events under different anomaly definitions, which is important for applications such as intelligent surveillance and live-streaming content moderation. Recent MLLM-based methods have shown promising open-world generalization, but still suffer from three major limitations: inefficiency for practical deployment, lack of streaming processing adaptation, and limited support for dynamic anomaly definitions in both modeling and evaluation. To address these issues, this paper proposes ESOM, an efficient streaming OWVAD model that operates in a training-free manner. ESOM includes a Definition Normalization module to structure user prompts for reducing hallucination, an Inter-frame-matched Intra-frame Token Merging module to compress redundant visual tokens, a Hybrid Streaming Memory module for efficient causal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Kamino666/ESOM_OpenDef-Bench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.