Event-based Monocular Dense Depth Estimation with Recurrent Transformers

Xu Liu; Jianing Li; Xiaopeng Fan; Yonghong Tian

arXiv:2212.02791·cs.CV·December 7, 2022·5 cites

Event-based Monocular Dense Depth Estimation with Recurrent Transformers

Xu Liu, Jianing Li, Xiaopeng Fan, Yonghong Tian

PDF

Open Access

TL;DR

This paper introduces EReFormer, a novel event-based monocular depth estimation method using recurrent transformers that effectively exploits spatial and temporal information from asynchronous event streams, outperforming existing methods.

Contribution

The paper presents the first pure transformer-based recursive model for event-based monocular depth estimation, combining spatial and temporal modeling with improved efficiency.

Findings

01

EReFormer outperforms state-of-the-art methods on synthetic and real datasets.

02

The recursive transformer mechanism enhances temporal modeling capabilities.

03

The spatial transformer fusion module improves global context understanding.

Abstract

Event cameras, offering high temporal resolutions and high dynamic ranges, have brought a new perspective to address common challenges (e.g., motion blur and low light) in monocular depth estimation. However, how to effectively exploit the sparse spatial information and rich temporal cues from asynchronous events remains a challenging endeavor. To this end, we propose a novel event-based monocular depth estimator with recurrent transformers, namely EReFormer, which is the first pure transformer with a recursive mechanism to process continuous event streams. Technically, for spatial modeling, a novel transformer-based encoder-decoder with a spatial transformer fusion module is presented, having better global context information modeling capabilities than CNN-based methods. For temporal modeling, we design a gate recurrent vision transformer unit that introduces a recursive mechanism into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · CCD and CMOS Imaging Sensors · Advanced Neural Network Applications

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Linear Layer · Spatial Transformer · Dense Connections · Residual Connection · Layer Normalization · Vision Transformer