CrowdGaussian: Reconstructing High-Fidelity 3D Gaussians for Human Crowd from a Single Image

Yizheng Song; Yiyu Zhuang; Qipeng Xu; Haixiang Wang; Jiahe Zhu; Jing Tian; Siyu Zhu; Hao Zhu

arXiv:2603.17779·cs.CV·March 24, 2026

CrowdGaussian: Reconstructing High-Fidelity 3D Gaussians for Human Crowd from a Single Image

Yizheng Song, Yiyu Zhuang, Qipeng Xu, Haixiang Wang, Jiahe Zhu, Jing Tian, Siyu Zhu, Hao Zhu

PDF

Open Access

TL;DR

CrowdGaussian introduces a novel framework for reconstructing high-fidelity 3D models of human crowds from a single image, effectively handling occlusions and diverse appearances through self-supervised learning and diffusion-based refinement.

Contribution

The paper presents a unified approach for single-image multi-person 3D reconstruction using 3D Gaussian Splatting, with new self-supervised adaptation and Self-Calibrated Learning strategies.

Findings

01

Produces photorealistic, geometrically coherent multi-person 3D reconstructions

02

Effectively handles occlusions and diverse appearances in crowded scenes

03

Outperforms existing methods in quality and realism

Abstract

Single-view 3D human reconstruction has garnered significant attention in recent years. Despite numerous advancements, prior research has concentrated on reconstructing 3D models from clear, close-up images of individual subjects, often yielding subpar results in the more prevalent multi-person scenarios. Reconstructing 3D human crowd models is a highly intricate task, laden with challenges such as: 1) extensive occlusions, 2) low clarity, and 3) numerous and various appearances. To address this task, we propose CrowdGaussian, a unified framework that directly reconstructs multi-person 3D Gaussian Splatting (3DGS) representations from single-image inputs. To handle occlusions, we devise a self-supervised adaptation pipeline that enables the pretrained large human model to reconstruct complete 3D humans with plausible geometry and appearance from heavily occluded inputs. Furthermore, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging