Edge-Based Multimodal Sensor Data Fusion with Vision Language Models (VLMs) for Real-time Autonomous Vehicle Accident Avoidance

Fengze Yang; Bo Yu; Yang Zhou; Xuewen Luo; Zhengzhong Tu; Chenxi Liu

arXiv:2508.01057·cs.AI·August 13, 2025

Edge-Based Multimodal Sensor Data Fusion with Vision Language Models (VLMs) for Real-time Autonomous Vehicle Accident Avoidance

Fengze Yang, Bo Yu, Yang Zhou, Xuewen Luo, Zhengzhong Tu, Chenxi Liu

PDF

Open Access

TL;DR

This paper introduces REACT, a real-time, edge-optimized V2X trajectory planning framework using lightweight vision-language models to enhance autonomous vehicle safety through multimodal sensor fusion and contextual reasoning.

Contribution

It presents a novel, lightweight VLM-based framework with edge-adaptation strategies for real-time multimodal fusion and trajectory optimization in autonomous driving.

Findings

01

Achieves 77% collision rate reduction on DeepAccident benchmark.

02

Attains 48.2% Video Panoptic Quality (VPQ).

03

Operates with 0.57-second inference latency on Jetson AGX Orin.

Abstract

Autonomous driving (AD) systems relying solely on onboard sensors may fail to detect distant or obstacle hazards, potentially causing preventable collisions; however, existing transformer-based Vehicle-to-Everything (V2X) approaches, which mitigate AD sensing limitations, either lack effective multimodal fusion and reasoning or struggle to meet real-time performance requirements under complex, high-dimensional traffic conditions. This paper proposes the Real-time Edge-based Autonomous Co-pilot Trajectory planner (REACT), a V2X-integrated trajectory optimization framework for AD based on a fine-tuned lightweight Vision-Language Model (VLM). REACT integrates infrastructure-provided hazard alerts with onboard sensor data, capturing intricate surrounding traffic dynamics and vehicle intents through visual embeddings, interpreting precise numerical data from symbolic inputs, and employing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning