DELTA: Dense Depth from Events and LiDAR using Transformer's Attention
Vincent Brebion, Julien Moreau, Franck Davoine

TL;DR
This paper introduces DELTA, a neural network that fuses event camera and LiDAR data using attention mechanisms to produce dense depth maps, significantly improving accuracy over previous methods.
Contribution
DELTA is the first to effectively combine event and LiDAR data with transformer-based attention for dense depth estimation, setting new state-of-the-art results.
Findings
Reduces depth estimation errors up to four times for close ranges.
Outperforms previous state-of-the-art methods in event-based depth estimation.
Demonstrates the effectiveness of attention mechanisms in multi-modal data fusion.
Abstract
Event cameras and LiDARs provide complementary yet distinct data: respectively, asynchronous detections of changes in lighting versus sparse but accurate depth information at a fixed rate. To this day, few works have explored the combination of these two modalities. In this article, we propose a novel neural-network-based method for fusing event and LiDAR data in order to estimate dense depth maps. Our architecture, DELTA, exploits the concepts of self- and cross-attention to model the spatial and temporal relations within and between the event and LiDAR data. Following a thorough evaluation, we demonstrate that DELTA sets a new state of the art in the event-based depth estimation problem, and that it is able to reduce the errors up to four times for close ranges compared to the previous SOTA.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
