TL;DR
This paper introduces Video Event Completion (VEC), a novel deep learning approach for video anomaly detection that leverages a visual cloze test to improve localization and semantic understanding of video activities.
Contribution
The paper proposes a new VAD framework using a visual cloze test, combining appearance, motion, and high-level semantics for more accurate and comprehensive anomaly detection.
Findings
VEC outperforms state-of-the-art methods by 1.5%-5% AUROC on benchmarks.
Utilizes appearance and motion cues for precise localization.
Incorporates high-level semantics through a visual cloze test.
Abstract
As a vital topic in media content interpretation, video anomaly detection (VAD) has made fruitful progress via deep neural network (DNN). However, existing methods usually follow a reconstruction or frame prediction routine. They suffer from two gaps: (1) They cannot localize video activities in a both precise and comprehensive manner. (2) They lack sufficient abilities to utilize high-level semantics and temporal context information. Inspired by frequently-used cloze test in language study, we propose a brand-new VAD solution named Video Event Completion (VEC) to bridge gaps above: First, we propose a novel pipeline to achieve both precise and comprehensive enclosure of video activities. Appearance and motion are exploited as mutually complimentary cues to localize regions of interest (RoIs). A normalized spatio-temporal cube (STC) is built from each RoI as a video event, which lays…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
