Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges
Tongtong Yuan, Xuange Zhang, Kun Liu, Bo Liu, Chen Chen, Jian Jin,, Zhenzhen Jiao

TL;DR
This paper introduces a new multimodal surveillance video dataset with detailed annotations, benchmarks state-of-the-art models on multiple tasks, and highlights the challenges and potential for improved AI understanding in surveillance scenarios.
Contribution
The paper presents the first annotated multimodal surveillance dataset, UCA, and benchmarks models for video-and-language understanding in surveillance, revealing new challenges and opportunities.
Findings
Mainstream models perform poorly on surveillance data
Multimodal learning improves anomaly detection
Constructed dataset enables future research in surveillance AI
Abstract
Surveillance videos are an essential component of daily life with various critical applications, particularly in public security. However, current surveillance video tasks mainly focus on classifying and localizing anomalous events. Existing methods are limited to detecting and classifying the predefined events with unsatisfactory semantic understanding, although they have obtained considerable performance. To address this issue, we propose a new research direction of surveillance video-and-language understanding, and construct the first multimodal surveillance video dataset. We manually annotate the real-world surveillance dataset UCF-Crime with fine-grained event content and timing. Our newly annotated dataset, UCA (UCF-Crime Annotation), contains 23,542 sentences, with an average length of 20 words, and its annotated videos are as long as 110.7 hours. Furthermore, we benchmark SOTA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Data-Driven Disease Surveillance · Viral Infections and Outbreaks Research
MethodsFocus
