Simplifying Traffic Anomaly Detection with Video Foundation Models

Svetlana Orlova; Tommie Kerssies; Brun\'o B. Englert; Gijs Dubbelman

arXiv:2507.09338·cs.CV·September 3, 2025

Simplifying Traffic Anomaly Detection with Video Foundation Models

Svetlana Orlova, Tommie Kerssies, Brun\'o B. Englert, Gijs Dubbelman

PDF

1 Models

TL;DR

This paper demonstrates that simple encoder-only Video Vision Transformers, when properly pre-trained, can effectively and efficiently perform traffic anomaly detection, challenging the need for complex architectures.

Contribution

It shows that advanced pre-training enables simple models to outperform complex methods in traffic anomaly detection, emphasizing the importance of pre-training strategies.

Findings

01

Pre-trained simple models match or surpass complex state-of-the-art methods.

02

Self-supervised Masked Video Modeling is most effective for TAD.

03

Domain-Adaptive Pre-Training improves downstream performance without labeled anomalies.

Abstract

Recent methods for ego-centric Traffic Anomaly Detection (TAD) often rely on complex multi-stage or multi-representation fusion architectures, yet it remains unclear whether such complexity is necessary. Recent findings in visual perception suggest that foundation models, enabled by advanced pre-training, allow simple yet flexible architectures to outperform specialized designs. Therefore, in this work, we investigate an architecturally simple encoder-only approach using plain Video Vision Transformers (Video ViTs) and study how pre-training enables strong TAD performance. We find that: (i) advanced pre-training enables simple encoder-only models to match or even surpass the performance of specialized state-of-the-art TAD methods, while also being significantly more efficient; (ii) although weakly- and fully-supervised pre-training are advantageous on standard benchmarks, we find them…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
tue-mps/simple-tad
model· ♡ 2
♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.