JOintGS: Joint Optimization of Cameras, Bodies and 3D Gaussians for In-the-Wild Monocular Reconstruction
Zihan Lou, Jinlong Fan, Sihan Ma, Yuxiang Yang, Jing Zhang

TL;DR
JOintGS is a unified framework that jointly optimizes camera parameters, human poses, and 3D Gaussian representations from monocular videos, enabling high-fidelity, real-time 3D human avatar reconstruction in unconstrained environments.
Contribution
It introduces a joint optimization approach with foreground-background disentanglement, temporal dynamics, and residual color modeling, improving robustness and reconstruction quality over prior methods.
Findings
Achieves 2.1 dB PSNR improvement on NeuMan dataset.
Maintains real-time rendering performance.
Shows robustness to noisy initializations.
Abstract
Reconstructing high-fidelity animatable 3D human avatars from monocular RGB videos remains challenging, particularly in unconstrained in-the-wild scenarios where camera parameters and human poses from off-the-shelf methods (e.g., COLMAP, HMR2.0) are often inaccurate. Splatting (3DGS) advances demonstrate impressive rendering quality and real-time performance, they critically depend on precise camera calibration and pose annotations, limiting their applicability in real-world settings. We present JOintGS, a unified framework that jointly optimizes camera extrinsics, human poses, and 3D Gaussian representations from coarse initialization through a synergistic refinement mechanism. Our key insight is that explicit foreground-background disentanglement enables mutual reinforcement: static background Gaussians anchor camera estimation via multi-view consistency; refined cameras improve human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · 3D Shape Modeling and Analysis · Advanced Vision and Imaging
