TL;DR
This paper presents a modular zero-shot pipeline for detecting, localizing, and classifying traffic accidents in surveillance videos without using labeled training data, leveraging pre-trained models and simple signal processing techniques.
Contribution
The authors introduce a novel zero-shot approach that combines peak detection, optical flow analysis, and CLIP-based classification for accident analysis in videos without domain-specific fine-tuning.
Findings
Successfully localizes accidents in time using peak detection.
Accurately finds impact location via optical flow centroid.
Classifies accident types with high zero-shot accuracy using CLIP embeddings.
Abstract
We describe a zero-shot pipeline developed for the ACCIDENT @ CVPR 2026 challenge. The challenge requires predicting when, where, and what type of traffic accident occurs in surveillance video, without labeled real-world training data. Our method separates the problem into three independent modules. The first module localizes the collision in time by running peak detection on z-score normalized frame-difference signals. The second module finds the impact location by computing the weighted centroid of cumulative dense optical flow magnitude maps using the Farneback algorithm. The third module classifies collision type by measuring cosine similarity between CLIP image embeddings of frames near the detected peak and text embeddings built from multi-prompt natural language descriptions of each collision category. No domain-specific fine-tuning is involved; the pipeline processes each video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
