Designing Deep Networks for Surface Normal Estimation
Xiaolong Wang, David F. Fouhey, Abhinav Gupta

TL;DR
This paper introduces a new CNN architecture for surface normal estimation from a single image, leveraging scene constraints and intermediate representations to achieve state-of-the-art results and robustness across datasets.
Contribution
The paper proposes a novel CNN architecture that incorporates scene constraints and intermediate representations for improved surface normal estimation.
Findings
Achieves state-of-the-art performance on surface normal estimation.
Robust results across multiple datasets without fine-tuning.
Incorporating scene constraints improves accuracy.
Abstract
In the past few years, convolutional neural nets (CNN) have shown incredible promise for learning visual representations. In this paper, we use CNNs for the task of predicting surface normals from a single image. But what is the right architecture we should use? We propose to build upon the decades of hard work in 3D scene understanding, to design new CNN architecture for the task of surface normal estimation. We show by incorporating several constraints (man-made, manhattan world) and meaningful intermediate representations (room layout, edge labels) in the architecture leads to state of the art performance on surface normal estimation. We also show that our network is quite robust and show state of the art results on other datasets as well without any fine-tuning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · 3D Surveying and Cultural Heritage · 3D Shape Modeling and Analysis
