M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for   Autonomous Driving

Dongyang Xu; Haokun Li; Qingfan Wang; Ziying Song; Lei Chen and; Hanming Deng

arXiv:2403.12552·cs.CV·March 20, 2024·6 cites

M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving

Dongyang Xu, Haokun Li, Qingfan Wang, Ziying Song, Lei Chen and, Hanming Deng

PDF

Open Access

TL;DR

This paper introduces M2DA, a multi-modal fusion transformer with driver attention that improves environment perception and scene understanding in autonomous driving, achieving state-of-the-art results in simulation.

Contribution

The paper proposes a novel Lidar-Vision-Attention Fusion module and incorporates driver attention to enhance multi-modal data integration and human-like scene understanding.

Findings

01

Achieves state-of-the-art performance on CARLA benchmarks

02

Requires less data for training compared to existing methods

03

Enhances safety by accurately identifying critical agents

Abstract

End-to-end autonomous driving has witnessed remarkable progress. However, the extensive deployment of autonomous vehicles has yet to be realized, primarily due to 1) inefficient multi-modal environment perception: how to integrate data from multi-modal sensors more efficiently; 2) non-human-like scene understanding: how to effectively locate and predict critical risky agents in traffic scenarios like an experienced driver. To overcome these challenges, in this paper, we propose a Multi-Modal fusion transformer incorporating Driver Attention (M2DA) for autonomous driving. To better fuse multi-modal data and achieve higher alignment between different modalities, a novel Lidar-Vision-Attention-based Fusion (LVAFusion) module is proposed. By incorporating driver attention, we empower the human-like scene understanding ability to autonomous vehicles to identify crucial areas within complex…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · EEG and Brain-Computer Interfaces · Gaze Tracking and Assistive Technology

MethodsEntropy Regularization · Proximal Policy Optimization · CARLA: An Open Urban Driving Simulator