TL;DR
This paper advocates for an event-centric evaluation approach in pose-based Video Anomaly Detection, highlighting the limitations of frame-level metrics and proposing new strategies and standards for better real-world applicability.
Contribution
It introduces an event-based evaluation standard for VAD, analyzes existing benchmarks, and proposes two strategies for temporal event localization.
Findings
State-of-the-art models perform poorly at event-level localization with precision below 10%.
Existing benchmarks are misaligned with real-world event detection needs.
The proposed methods improve event detection metrics but still reveal significant performance gaps.
Abstract
Pose-based Video Anomaly Detection (VAD) has gained significant attention for its privacy-preserving nature and robustness to environmental variations. However, traditional frame-level evaluations treat video as a collection of isolated frames, fundamentally misaligned with how anomalies manifest and are acted upon in the real world. In operational surveillance systems, what matters is not the flagging of individual frames, but the reliable detection, localization, and reporting of a coherent anomalous event, a contiguous temporal episode with an identifiable onset and duration. Frame-level metrics are blind to this distinction, and as a result, they systematically overestimate model performance for any deployment that requires actionable, event-level alerts. In this work, we propose a shift toward an event-centric perspective in VAD. We first audit widely used VAD benchmarks, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
