URCDC-Depth: Uncertainty Rectified Cross-Distillation with CutFlip for Monocular Depth Estimation
Shuwei Shao, Zhongcai Pei, Weihai Chen, Ran Li, Zhong Liu, Zhengguo, Li

TL;DR
URCDC-Depth introduces an uncertainty-aware cross-distillation framework combining Transformer and CNN for monocular depth estimation, enhanced by a novel data augmentation technique, CutFlip, achieving state-of-the-art results without extra inference cost.
Contribution
The paper proposes a novel uncertainty rectified cross-distillation method with feature transfer and a simple data augmentation, significantly improving monocular depth estimation accuracy.
Findings
Outperforms previous state-of-the-art on KITTI, NYU-Depth-v2, and SUN RGB-D datasets.
Effectively models pixel-wise depth uncertainty to improve pseudo label quality.
Uses CutFlip augmentation to enhance model robustness and depth inference.
Abstract
This work aims to estimate a high-quality depth map from a single RGB image. Due to the lack of depth clues, making full use of the long-range correlation and the local information is critical for accurate depth estimation. Towards this end, we introduce an uncertainty rectified cross-distillation between Transformer and convolutional neural network (CNN) to learn a unified depth estimator. Specifically, we use the depth estimates from the Transformer branch and the CNN branch as pseudo labels to teach each other. Meanwhile, we model the pixel-wise depth uncertainty to rectify the loss weights of noisy pseudo labels. To avoid the large capacity gap induced by the strong Transformer branch deteriorating the cross-distillation, we transfer the feature maps from Transformer to CNN and design coupling units to assist the weak CNN branch to leverage the transferred features. Furthermore, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Optical measurement and interference techniques
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Linear Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Adam · Position-Wise Feed-Forward Layer · Softmax
