FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis
Vishnu Mani Hema, Shubhra Aich, Christian Haene, Jean-Charles Bazin,, Fernando de la Torre

TL;DR
This paper introduces FAMOUS, a method that uses large-scale 2D fashion datasets and view synthesis to improve high-fidelity 3D human digitization from a single image, especially in texture inference.
Contribution
It leverages 2D priors and domain alignment strategies to enhance texture and shape prediction in 3D human models from monocular images.
Findings
Outperforms existing methods in texture accuracy
Achieves superior geometric reconstruction on benchmarks
Effectively infers occluded back views using 2D data
Abstract
The advancement in deep implicit modeling and articulated models has significantly enhanced the process of digitizing human figures in 3D from just a single image. While state-of-the-art methods have greatly improved geometric precision, the challenge of accurately inferring texture remains, particularly in obscured areas such as the back of a person in frontal-view images. This limitation in texture prediction largely stems from the scarcity of large-scale and diverse 3D datasets, whereas their 2D counterparts are abundant and easily accessible. To address this issue, our paper proposes leveraging extensive 2D fashion datasets to enhance both texture and shape prediction in 3D human digitization. We incorporate 2D priors from the fashion dataset to learn the occluded back view, refined with our proposed domain alignment strategy. We then fuse this information with the input image to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
