Gated Cross-Attention Network for Depth Completion
Xiaogang Jia, Songlei Jian, Yusong Tan, Yonggang Che, Wei Chen and, Zhengfa Liang

TL;DR
This paper introduces a Gated Cross-Attention Network that effectively fuses color and depth features for fast, accurate depth completion, outperforming existing methods on benchmark datasets.
Contribution
The paper proposes a novel Gated Cross-Attention Network with a gating mechanism and Transformer-based global feature fusion, achieving state-of-the-art depth completion without extra post-processing.
Findings
Achieves Pareto-optimal speed and accuracy trade-offs.
Ranks first on KITTI depth completion benchmark.
Performs well on indoor and outdoor datasets.
Abstract
Depth completion is a popular research direction in the field of depth estimation. The fusion of color and depth features is the current critical challenge in this task, mainly due to the asymmetry between the rich scene details in color images and the sparse pixels in depth maps. To tackle this issue, we design an efficient Gated Cross-Attention Network that propagates confidence via a gating mechanism, simultaneously extracting and refining key information in both color and depth branches to achieve local spatial feature fusion. Additionally, we employ an attention network based on the Transformer in low-dimensional space to effectively fuse global features and increase the network's receptive field. With a simple yet efficient gating mechanism, our proposed method achieves fast and accurate depth completion without the need for additional branches or post-processing steps. At the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Residual Connection · Sigmoid Activation · Tanh Activation · Label Smoothing · Long Short-Term Memory · Absolute Position Encodings
