Look, Cast and Mold: Learning 3D Shape Manifold from Single-view Synthetic Data
Qianyu Feng, Yawei Luo, Keyang Luo, Yi Yang

TL;DR
This paper introduces VPAN, a novel deep learning framework that learns 3D shape representations from synthetic single-view images, effectively bridging domain gaps and improving 3D reconstruction accuracy.
Contribution
The paper proposes a Visio-Perceptual Adaptive Network (VPAN) that integrates spatial structure, cross-modal semantic alignment, and shape manifold transformation for single-view 3D reconstruction from synthetic data.
Findings
Outperforms state-of-the-art on Pix3D with IoU 0.292 and CD 0.108
Achieves IoU 0.329 and CD 0.104 on Pascal 3D+
Demonstrates robustness and effectiveness in learning 3D shape manifolds from synthetic data
Abstract
Inferring the stereo structure of objects in the real world is a challenging yet practical task. To equip deep models with this ability usually requires abundant 3D supervision which is hard to acquire. It is promising that we can simply benefit from synthetic data, where pairwise ground-truth is easy to access. Nevertheless, the domain gaps are nontrivial considering the variant texture, shape and context. To overcome these difficulties, we propose a Visio-Perceptual Adaptive Network for single-view 3D reconstruction, dubbed VPAN. To generalize the model towards a real scenario, we propose to fulfill several aspects: (1) Look: visually incorporate spatial structure from the single view to enhance the expressiveness of representation; (2) Cast: perceptually align the 2D image features to the 3D shape priors with cross-modal semantic contrastive mapping; (3) Mold: reconstruct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Human Pose and Action Recognition
