InterFuserDVS: Event-Enhanced Sensor Fusion for Safe RL-Based Decision Making
Mustafa Sakhaia, Kaung Sithua, Min Khant Soe Okea, Maciej Wielgosza

TL;DR
This paper enhances autonomous driving perception by integrating Dynamic Vision Sensors with traditional sensors using a novel transformer-based fusion, improving robustness in challenging conditions.
Contribution
It introduces a new token-based fusion strategy for combining DVS with RGB and LiDAR in an extended InterFuser model for autonomous driving.
Findings
DVS integration improves robustness in high-dynamic-range scenes.
Achieved a Driving Score of 77.2 on CARLA benchmarks.
Route Completion reached 100%, outperforming some baselines.
Abstract
Autonomous driving systems rely heavily on robust sensor fusion to perceive complex envi- ronments. Traditional setups using RGB cameras and LiDAR often struggle in high-dynamic- range scenes or high-speed scenarios due to motion blur and latency. Dynamic Vision Sensors (DVS), or event cameras, offer a paradigm shift by capturing asynchronous brightness changes with microsecond temporal resolution and high dynamic range. In this paper, we propose an extended architecture of the state-of-the-art InterFuser model, integrating DVS as an additional modality to enhance perception reliability. We introduce a novel token-based fusion strategy that incorporates accumulated event frames into the transformer-based backbone of InterFuser. Our method leverages the complementary nature of RGB, LiDAR, and DVS data. We evaluate our approach on the Car Learning to Act (CARLA) Leaderboard benchmarks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
