SDformer: Efficient End-to-End Transformer for Depth Completion

Jian Qian; Miao Sun; Ashley Lee; Jie Li; Shenglong Zhuo; Patrick Yin; Chiang

arXiv:2409.08159·cs.CV·September 13, 2024

SDformer: Efficient End-to-End Transformer for Depth Completion

Jian Qian, Miao Sun, Ashley Lee, Jie Li, Shenglong Zhuo, Patrick Yin, Chiang

PDF

Open Access 1 Repo

TL;DR

SDformer introduces a window-based Transformer architecture for depth completion, effectively capturing long-range dependencies with lower computational costs, outperforming CNN-based models on standard datasets.

Contribution

The paper proposes a novel Sparse-to-Dense Transformer (SDformer) with windowed self-attention for efficient depth completion, addressing CNN limitations.

Findings

01

Achieves state-of-the-art results on NYU Depth V2 and KITTI datasets.

02

Reduces computational load and parameters compared to CNN models.

03

Effectively captures long-range dependencies with window-based self-attention.

Abstract

Depth completion aims to predict dense depth maps with sparse depth measurements from a depth sensor. Currently, Convolutional Neural Network (CNN) based models are the most popular methods applied to depth completion tasks. However, despite the excellent high-end performance, they suffer from a limited representation area. To overcome the drawbacks of CNNs, a more effective and powerful method has been presented: the Transformer, which is an adaptive self-attention setting sequence-to-sequence model. While the standard Transformer quadratically increases the computational cost from the key-query dot-product of input resolution which improperly employs depth completion tasks. In this work, we propose a different window-based Transformer architecture for depth completion tasks named Sparse-to-Dense Transformer (SDformer). The network consists of an input module for the depth map and RGB…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jamesqian11/sdformer-for-depth-completion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection

MethodsByte Pair Encoding · Absolute Position Encodings · Softmax · Label Smoothing · Dropout · Layer Normalization · Attention Is All You Need · Position-Wise Feed-Forward Layer · Linear Layer · Adam