Intrinsic Geometry-Appearance Consistency Optimization for Sparse-View Gaussian Splatting

Kaiqiang Xiong; Rui Peng; Jiahao Wu; Zhanke Wang; Jie Liang; Xiaoyun Zheng; Feng Gao; Ronggang Wang

arXiv:2603.02893·cs.CV·March 4, 2026

Intrinsic Geometry-Appearance Consistency Optimization for Sparse-View Gaussian Splatting

Kaiqiang Xiong, Rui Peng, Jiahao Wu, Zhanke Wang, Jie Liang, Xiaoyun Zheng, Feng Gao, Ronggang Wang

PDF

Open Access

TL;DR

This paper introduces MVD-HuGaS, a novel method for 3D human reconstruction from a single image using a multi-view diffusion model, joint optimization of camera poses, and facial refinement, achieving state-of-the-art results.

Contribution

The work presents a multi-view diffusion model fine-tuned on 3D datasets, an alignment module for camera pose estimation, and a facial distortion mitigation technique for improved 3D human reconstruction.

Findings

01

Achieves state-of-the-art performance on Thuman2.0 and 2K2K datasets.

02

Effectively refines facial regions for higher fidelity.

03

Enables high-quality free-view 3D human rendering from a single image.

Abstract

3D human reconstruction from a single image is a challenging problem and has been exclusively studied in the literature. Recently, some methods have resorted to diffusion models for guidance, optimizing a 3D representation via Score Distillation Sampling(SDS) or generating a back-view image for facilitating reconstruction. However, these methods tend to produce unsatisfactory artifacts (\textit{e.g.} flattened human structure or over-smoothing results caused by inconsistent priors from multiple views) and struggle with real-world generalization in the wild. In this work, we present \emph{MVD-HuGaS}, enabling free-view 3D human rendering from a single image via a multi-view human diffusion model. We first generate multi-view images from the single reference image with an enhanced multi-view diffusion model, which is well fine-tuned on high-quality 3D human datasets to incorporate 3D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques