Towards Surveillance Video-and-Language Understanding: New Dataset,   Baselines, and Challenges

Tongtong Yuan; Xuange Zhang; Kun Liu; Bo Liu; Chen Chen; Jian Jin,; Zhenzhen Jiao

arXiv:2309.13925·cs.CV·December 5, 2023

Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges

Tongtong Yuan, Xuange Zhang, Kun Liu, Bo Liu, Chen Chen, Jian Jin,, Zhenzhen Jiao

PDF

Open Access

TL;DR

This paper introduces a new multimodal surveillance video dataset with detailed annotations, benchmarks state-of-the-art models on multiple tasks, and highlights the challenges and potential for improved AI understanding in surveillance scenarios.

Contribution

The paper presents the first annotated multimodal surveillance dataset, UCA, and benchmarks models for video-and-language understanding in surveillance, revealing new challenges and opportunities.

Findings

01

Mainstream models perform poorly on surveillance data

02

Multimodal learning improves anomaly detection

03

Constructed dataset enables future research in surveillance AI

Abstract

Surveillance videos are an essential component of daily life with various critical applications, particularly in public security. However, current surveillance video tasks mainly focus on classifying and localizing anomalous events. Existing methods are limited to detecting and classifying the predefined events with unsatisfactory semantic understanding, although they have obtained considerable performance. To address this issue, we propose a new research direction of surveillance video-and-language understanding, and construct the first multimodal surveillance video dataset. We manually annotate the real-world surveillance dataset UCF-Crime with fine-grained event content and timing. Our newly annotated dataset, UCA (UCF-Crime Annotation), contains 23,542 sentences, with an average length of 20 words, and its annotated videos are as long as 110.7 hours. Furthermore, we benchmark SOTA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Data-Driven Disease Surveillance · Viral Infections and Outbreaks Research

MethodsFocus