GlocalFuse-Depth: Fusing Transformers and CNNs for All-day   Self-supervised Monocular Depth Estimation

Zezheng Zhang; Ryan K. Y. Chan; Kenneth K. Y. Wong

arXiv:2302.09884·cs.CV·February 21, 2023

GlocalFuse-Depth: Fusing Transformers and CNNs for All-day Self-supervised Monocular Depth Estimation

Zezheng Zhang, Ryan K. Y. Chan, Kenneth K. Y. Wong

PDF

Open Access

TL;DR

GlocalFuse-Depth introduces a dual-branch network combining CNNs and Transformers to improve self-supervised monocular depth estimation across all-day conditions, effectively handling domain shifts between daytime and nighttime images.

Contribution

The paper proposes a novel two-branch network with a fusion module that effectively combines CNN and Transformer features for all-day depth estimation.

Findings

01

Achieves state-of-the-art results on Oxford RobotCar dataset.

02

Effectively handles domain shift between day and night images.

03

Demonstrates superior performance over existing methods.

Abstract

In recent years, self-supervised monocular depth estimation has drawn much attention since it frees of depth annotations and achieved remarkable results on standard benchmarks. However, most of existing methods only focus on either daytime or nighttime images, thus their performance degrades on the other domain because of the large domain shift between daytime and nighttime images. To address this problem, in this paper we propose a two-branch network named GlocalFuse-Depth for self-supervised depth estimation of all-day images. The daytime and nighttime image in input image pair are fed into the two branches: CNN branch and Transformer branch, respectively, where both fine-grained details and global dependency can be efficiently captured. Besides, a novel fusion module is proposed to fuse multi-dimensional features from the two branches. Extensive experiments demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Advanced Image Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Dense Connections · Absolute Position Encodings · Adam · Position-Wise Feed-Forward Layer · Dropout · Byte Pair Encoding