SDformer: Efficient End-to-End Transformer for Depth Completion
Jian Qian, Miao Sun, Ashley Lee, Jie Li, Shenglong Zhuo, Patrick Yin, Chiang

TL;DR
SDformer introduces a window-based Transformer architecture for depth completion, effectively capturing long-range dependencies with lower computational costs, outperforming CNN-based models on standard datasets.
Contribution
The paper proposes a novel Sparse-to-Dense Transformer (SDformer) with windowed self-attention for efficient depth completion, addressing CNN limitations.
Findings
Achieves state-of-the-art results on NYU Depth V2 and KITTI datasets.
Reduces computational load and parameters compared to CNN models.
Effectively captures long-range dependencies with window-based self-attention.
Abstract
Depth completion aims to predict dense depth maps with sparse depth measurements from a depth sensor. Currently, Convolutional Neural Network (CNN) based models are the most popular methods applied to depth completion tasks. However, despite the excellent high-end performance, they suffer from a limited representation area. To overcome the drawbacks of CNNs, a more effective and powerful method has been presented: the Transformer, which is an adaptive self-attention setting sequence-to-sequence model. While the standard Transformer quadratically increases the computational cost from the key-query dot-product of input resolution which improperly employs depth completion tasks. In this work, we propose a different window-based Transformer architecture for depth completion tasks named Sparse-to-Dense Transformer (SDformer). The network consists of an input module for the depth map and RGB…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection
MethodsByte Pair Encoding · Absolute Position Encodings · Softmax · Label Smoothing · Dropout · Layer Normalization · Attention Is All You Need · Position-Wise Feed-Forward Layer · Linear Layer · Adam
