Event-based Monocular Dense Depth Estimation with Recurrent Transformers
Xu Liu, Jianing Li, Xiaopeng Fan, Yonghong Tian

TL;DR
This paper introduces EReFormer, a novel event-based monocular depth estimation method using recurrent transformers that effectively exploits spatial and temporal information from asynchronous event streams, outperforming existing methods.
Contribution
The paper presents the first pure transformer-based recursive model for event-based monocular depth estimation, combining spatial and temporal modeling with improved efficiency.
Findings
EReFormer outperforms state-of-the-art methods on synthetic and real datasets.
The recursive transformer mechanism enhances temporal modeling capabilities.
The spatial transformer fusion module improves global context understanding.
Abstract
Event cameras, offering high temporal resolutions and high dynamic ranges, have brought a new perspective to address common challenges (e.g., motion blur and low light) in monocular depth estimation. However, how to effectively exploit the sparse spatial information and rich temporal cues from asynchronous events remains a challenging endeavor. To this end, we propose a novel event-based monocular depth estimator with recurrent transformers, namely EReFormer, which is the first pure transformer with a recursive mechanism to process continuous event streams. Technically, for spatial modeling, a novel transformer-based encoder-decoder with a spatial transformer fusion module is presented, having better global context information modeling capabilities than CNN-based methods. For temporal modeling, we design a gate recurrent vision transformer unit that introduces a recursive mechanism into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · CCD and CMOS Imaging Sensors · Advanced Neural Network Applications
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Linear Layer · Spatial Transformer · Dense Connections · Residual Connection · Layer Normalization · Vision Transformer
