Multimodal Framework for Explainable Autonomous Driving: Integrating Video, Sensor, and Textual Data for Enhanced Decision-Making and Transparency
Abolfazl Zarghani, Amirhossein Ebrahimi, Amir Malekesfandiari

TL;DR
This paper presents a multimodal framework integrating video, sensor, and textual data to improve autonomous driving decision-making and generate human-readable explanations, enhancing transparency and trust in AV systems.
Contribution
It introduces a novel multimodal approach combining VideoMAE, sensor fusion, and BERT for interpretable autonomous driving, demonstrating superior performance on benchmark datasets.
Findings
Achieved 92.5% action prediction accuracy
Reduced training loss from 5.7231 to 0.0187
Generated explanations with a BLEU-4 score of 0.75
Abstract
Autonomous vehicles (AVs) are poised to redefine transportation by enhancing road safety, minimizing human error, and optimizing traffic efficiency. The success of AVs depends on their ability to interpret complex, dynamic environments through diverse data sources, including video streams, sensor measurements, and contextual textual information. However, seamlessly integrating these multimodal inputs and ensuring transparency in AI-driven decisions remain formidable challenges. This study introduces a novel multimodal framework that synergistically combines video, sensor, and textual data to predict driving actions while generating human-readable explanations, fostering trust and regulatory compliance. By leveraging VideoMAE for spatiotemporal video analysis, a custom sensor fusion module for real-time data processing, and BERT for textual comprehension, our approach achieves robust…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety
