Enhancing Vision-Language Models with Scene Graphs for Traffic Accident   Understanding

Aaron Lohner; Francesco Compagno; Jonathan Francis; Alessandro; Oltramari

arXiv:2407.05910·cs.CV·January 10, 2025

Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding

Aaron Lohner, Francesco Compagno, Jonathan Francis, Alessandro, Oltramari

PDF

Open Access

TL;DR

This paper introduces a multimodal approach that uses scene graphs combined with visual and textual data to classify traffic accident types, improving accuracy on a traffic anomaly benchmark.

Contribution

It presents a novel multi-stage pipeline that encodes traffic scenes as scene graphs and fuses this with vision and language data for accident classification.

Findings

01

Achieved 57.77% balanced accuracy on DoTA benchmark

02

Scene graph integration improves classification performance by nearly 5 percentage points

03

Demonstrates effectiveness of multimodal fusion in traffic accident understanding

Abstract

Recognizing a traffic accident is an essential part of any autonomous driving or road monitoring system. An accident can appear in a wide variety of forms, and understanding what type of accident is taking place may be useful to prevent it from recurring. This work focuses on classifying traffic scenes into specific accident types. We approach the problem by representing a traffic scene as a graph, where objects such as cars can be represented as nodes, and relative distances and directions between them as edges. This representation of a traffic scene is referred to as a scene graph, and can be used as input for an accident classifier. Better results are obtained with a classifier that fuses the scene graph input with visual and textual representations. This work introduces a multi-stage, multimodal pipeline that pre-processes videos of traffic accidents, encodes them as scene graphs,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Topic Modeling

MethodsFocus · ALIGN