Improving Equivariance in State-of-the-Art Supervised Depth and Normal Predictors
Yuanyi Zhong, Anand Bhattad, Yu-Xiong Wang, David Forsyth

TL;DR
This paper identifies that current state-of-the-art depth and normal predictors lack cropping-and-resizing equivariance and proposes a regularization method to explicitly enforce this property, improving their accuracy and robustness.
Contribution
The authors introduce an equivariant regularization technique that enhances cropping-and-resizing equivariance in depth and normal predictors across CNN and Transformer models.
Findings
Improved equivariance in depth and normal predictions.
Enhanced accuracy on Taskonomy and NYU-v2 datasets.
Applicable to both supervised and semi-supervised learning.
Abstract
Dense depth and surface normal predictors should possess the equivariant property to cropping-and-resizing -- cropping the input image should result in cropping the same output image. However, we find that state-of-the-art depth and normal predictors, despite having strong performances, surprisingly do not respect equivariance. The problem exists even when crop-and-resize data augmentation is employed during training. To remedy this, we propose an equivariant regularization technique, consisting of an averaging procedure and a self-consistency loss, to explicitly promote cropping-and-resizing equivariance in depth and normal networks. Our approach can be applied to both CNN and Transformer architectures, does not incur extra cost during testing, and notably improves the supervised and semi-supervised learning performance of dense predictors on Taskonomy tasks. Finally, finetuning with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Improving Equivariance in State-of-the-Art Supervised Depth and Normal Predictors· youtube
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Human Pose and Action Recognition
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Absolute Position Encodings · Dense Connections · Layer Normalization · Byte Pair Encoding
