FOF-X: Towards Real-time Detailed Human Reconstruction from a Single Image
Qiao Feng, Yuanwang Yang, Yebin Liu, Yu-Kun Lai, Jingyu Yang, Kun Li

TL;DR
FOF-X is a real-time system for detailed human reconstruction from a single image, utilizing a novel Fourier Occupancy Field representation to balance speed and quality.
Contribution
We introduce Fourier Occupancy Field (FOF), a new 3D representation that enables real-time, high-quality human reconstruction from a single image, with improved robustness and compatibility with 2D CNNs.
Findings
Achieves state-of-the-art results on multiple datasets.
Runs in real-time with high detail quality.
Demonstrates robustness to domain gaps and lighting variations.
Abstract
We introduce FOF-X for real-time reconstruction of detailed human geometry from a single image. Balancing real-time speed against high-quality results is a persistent challenge, mainly due to the high computational demands of existing 3D representations. To address this, we propose Fourier Occupancy Field (FOF), an efficient 3D representation by learning the Fourier series. The core of FOF is to factorize a 3D occupancy field into a 2D vector field, retaining topology and spatial relationships within the 3D domain while facilitating compatibility with 2D convolutional neural networks. Such a representation bridges the gap between 3D and 2D domains, enabling the integration of human parametric models as priors and enhancing the reconstruction robustness. Based on FOF, we design a new reconstruction framework, FOF-X, to avoid the performance degradation caused by texture and lighting.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Medical Imaging Techniques and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
