Multi-modal Sensor Fusion-Based Deep Neural Network for End-to-end Autonomous Driving with Scene Understanding
Zhiyu Huang, Chen Lv, Yang Xing, Jingda Wu

TL;DR
This paper presents a deep neural network that fuses visual and depth data for end-to-end autonomous driving, improving scene understanding and control in urban simulations with better generalization and success rates.
Contribution
It introduces a multimodal sensor fusion approach integrated with scene understanding for end-to-end autonomous driving, demonstrating enhanced performance and robustness.
Findings
Achieved 100% success in static navigation tasks in simulations.
Outperformed benchmark models in success rates across various tasks.
Sensor fusion and scene understanding are critical for robust autonomous driving.
Abstract
This study aims to improve the performance and generalization capability of end-to-end autonomous driving with scene understanding leveraging deep learning and multimodal sensor fusion techniques. The designed end-to-end deep neural network takes as input the visual image and associated depth information in an early fusion level and outputs the pixel-wise semantic segmentation as scene understanding and vehicle control commands concurrently. The end-to-end deep learning-based autonomous driving model is tested in high-fidelity simulated urban driving conditions and compared with the benchmark of CoRL2017 and NoCrash. The testing results show that the proposed approach is of better performance and generalization ability, achieving a 100% success rate in static navigation tasks in both training and unobserved situations, as well as better success rates in other tasks than the prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
