Generalizable Human Gaussians from Single-View Image
Jinnan Chen, Chen Li, Jianfeng Zhang, Lingting Zhu, Buzhen Huang,, Hanlin Chen, Gim Hee Lee

TL;DR
This paper presents a novel single-view human Gaussian model that reconstructs detailed 3D human appearance and geometry, including unobserved regions, using a generate-then-refine pipeline guided by human priors and diffusion models.
Contribution
The introduction of a generalizable Human Gaussian Model with a generate-then-refine pipeline and integration of human priors for improved 3D human reconstruction from a single image.
Findings
Outperforms previous methods in view synthesis and surface reconstruction
Demonstrates strong generalization across datasets and in-the-wild images
Effectively refines initial SMPL-X estimates for better accuracy
Abstract
In this work, we tackle the task of learning 3D human Gaussians from a single image, focusing on recovering detailed appearance and geometry including unobserved regions. We introduce a single-view generalizable Human Gaussian Model (HGM), which employs a novel generate-then-refine pipeline with the guidance from human body prior and diffusion prior. Our approach uses a ControlNet to refine rendered back-view images from coarse predicted human Gaussians, then uses the refined image along with the input image to reconstruct refined human Gaussians. To mitigate the potential generation of unrealistic human poses and shapes, we incorporate human priors from the SMPL-X model as a dual branch, propagating image features from the SMPL-X volume to the image Gaussians using sparse convolution and attention mechanisms. Given that the initial SMPL-X estimation might be inaccurate, we gradually…
Peer Reviews
Decision·ICLR 2025 Poster
- The problem addressed is quite challenging, and the designed network framework is reasonable, although the process is somewhat lengthy. - The overall writing of the paper is clear and coherent.
- The results are quite unsatisfactory, as there is a noticeable discrepancy between the synthesized face and the original image. While this issue is mentioned in the limitations section, Figure 9 suggests that the images represent two different individuals. - The comparison with other methods is insufficient, including approaches like TeCH[1] and GTA[2]. - The qualitative results appear somewhat blurry, with some input images also seeming unclear, particularly in the video results. - Minor issu
1. Feature Combination: The integration of pose-aware and pixel-space features improves the quality of the coarse GS prediction. 2. Enhanced Quality: This approach surpasses existing single-view human reconstruction methods in output quality.
1. Incremental Novelty: There are several papers on 3D reconstruction topic that have adopted diffusion model to synthesize more views as the first step. This paper improves similar idea by using stronger controlNet signal, i.e. the rendering of course GS. This idea, while effective, is not very inspiring to me. 2. SMPL-X Refinement Concerns: Although justified in ablation studies, the proposed iterative SMPL-X refinement is a EM-like approach, while the previous works always refine SMPL parame
In recent years, 3D Gaussian splatting (3DGS) has shown impressive results in 3D vision fields. Despite the promising results of 3DGS, the exploration of adapting 3D Gaussians for single-view 3D human reconstruction is still lacking. This paper points out the challenges of adapting 3D Gaussians for 3D human reconstruction. I think solving the proposed generate-then-refine pipeline could be one of the good research directions in 3D human reconstruction. The proposed approach sounds plausible, and
I have several concern points below. 1) About SMPL-X refinement strategy It is unclear why the proposed SMPL-X refinement pipeline actually helps. As described in Section 3.3.2, the SMPL-X is optimized by two-type losses: normal loss and rendering loss. The optimization targets of two losses can be inaccurate: normal GT is from normal estimator, rendering T is from coarse Gaussians. Furthermore, as the coarse Gaussians are processed based on the initial SMPL-X (Figure 2), it seems that coarse
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Advanced Vision and Imaging
MethodsDiffusion · Convolution
