M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving
Dongyang Xu, Haokun Li, Qingfan Wang, Ziying Song, Lei Chen and, Hanming Deng

TL;DR
This paper introduces M2DA, a multi-modal fusion transformer with driver attention that improves environment perception and scene understanding in autonomous driving, achieving state-of-the-art results in simulation.
Contribution
The paper proposes a novel Lidar-Vision-Attention Fusion module and incorporates driver attention to enhance multi-modal data integration and human-like scene understanding.
Findings
Achieves state-of-the-art performance on CARLA benchmarks
Requires less data for training compared to existing methods
Enhances safety by accurately identifying critical agents
Abstract
End-to-end autonomous driving has witnessed remarkable progress. However, the extensive deployment of autonomous vehicles has yet to be realized, primarily due to 1) inefficient multi-modal environment perception: how to integrate data from multi-modal sensors more efficiently; 2) non-human-like scene understanding: how to effectively locate and predict critical risky agents in traffic scenarios like an experienced driver. To overcome these challenges, in this paper, we propose a Multi-Modal fusion transformer incorporating Driver Attention (M2DA) for autonomous driving. To better fuse multi-modal data and achieve higher alignment between different modalities, a novel Lidar-Vision-Attention-based Fusion (LVAFusion) module is proposed. By incorporating driver attention, we empower the human-like scene understanding ability to autonomous vehicles to identify crucial areas within complex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · EEG and Brain-Computer Interfaces · Gaze Tracking and Assistive Technology
MethodsEntropy Regularization · Proximal Policy Optimization · CARLA: An Open Urban Driving Simulator
