VIDI: A Video Dataset of Incidents
Duygu Sesver, Alp Eren Gen\c{c}o\u{g}lu, \c{C}a\u{g}r{\i} Emre, Y{\i}ld{\i}z, Zehra G\"unindi, Faeze Habibi, Ziya Ata Yaz{\i}c{\i}, Haz{\i}m, Kemal Ekenel

TL;DR
This paper introduces VIDI, a comprehensive video dataset of 43 incident categories, and demonstrates that using video data significantly improves incident classification accuracy over still images.
Contribution
The paper presents VIDI, a new diverse video dataset for incident detection, and benchmarks state-of-the-art models showing the benefits of video data for classification.
Findings
Video data improves incident classification accuracy from 67.37% to 76.56%.
Recent models like Vision Transformer and TimeSformer outperform previous approaches.
VIDI dataset will be publicly available for further research.
Abstract
Automatic detection of natural disasters and incidents has become more important as a tool for fast response. There have been many studies to detect incidents using still images and text. However, the number of approaches that exploit temporal information is rather limited. One of the main reasons for this is that a diverse video dataset with various incident types does not exist. To address this need, in this paper we present a video dataset, Video Dataset of Incidents, VIDI, that contains 4,534 video clips corresponding to 43 incident categories. Each incident class has around 100 videos with a duration of ten seconds on average. To increase diversity, the videos have been searched in several languages. To assess the performance of the recent state-of-the-art approaches, Vision Transformer and TimeSformer, as well as to explore the contribution of video-based information for incident…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Data-Driven Disease Surveillance · Video Analysis and Summarization
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Absolute Position Encodings · Dropout · Adam · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing
