ViDAS: Vision-based Danger Assessment and Scoring
Pranav Gupta, Advith Krishnan, Naman Nanda, Ananth Eswar, Deeksha, Agarwal, Pratham Gohil, Pratyush Goel

TL;DR
This paper introduces a new dataset of YouTube videos annotated with danger levels by humans, and evaluates how well Large Language Models can replicate these assessments using video summaries, advancing danger analysis in videos.
Contribution
The paper presents a novel dataset for danger assessment in videos and demonstrates the potential of LLMs to perform human-like danger evaluations using multimodal data.
Findings
LLMs can assess danger levels with reasonable accuracy.
The dataset enables benchmarking of danger assessment methods.
LLMs show promise in real-time danger detection in videos.
Abstract
We present a novel dataset aimed at advancing danger analysis and assessment by addressing the challenge of quantifying danger in video content and identifying how human-like a Large Language Model (LLM) evaluator is for the same. This is achieved by compiling a collection of 100 YouTube videos featuring various events. Each video is annotated by human participants who provided danger ratings on a scale from 0 (no danger to humans) to 10 (life-threatening), with precise timestamps indicating moments of heightened danger. Additionally, we leverage LLMs to independently assess the danger levels in these videos using video summaries. We introduce Mean Squared Error (MSE) scores for multimodal meta-evaluation of the alignment between human and LLM danger assessments. Our dataset not only contributes a new resource for danger assessment in video content but also demonstrates the potential of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
