JOintGS: Joint Optimization of Cameras, Bodies and 3D Gaussians for In-the-Wild Monocular Reconstruction

Zihan Lou; Jinlong Fan; Sihan Ma; Yuxiang Yang; Jing Zhang

arXiv:2602.04317·cs.CV·February 5, 2026

JOintGS: Joint Optimization of Cameras, Bodies and 3D Gaussians for In-the-Wild Monocular Reconstruction

Zihan Lou, Jinlong Fan, Sihan Ma, Yuxiang Yang, Jing Zhang

PDF

Open Access

TL;DR

JOintGS is a unified framework that jointly optimizes camera parameters, human poses, and 3D Gaussian representations from monocular videos, enabling high-fidelity, real-time 3D human avatar reconstruction in unconstrained environments.

Contribution

It introduces a joint optimization approach with foreground-background disentanglement, temporal dynamics, and residual color modeling, improving robustness and reconstruction quality over prior methods.

Findings

01

Achieves 2.1 dB PSNR improvement on NeuMan dataset.

02

Maintains real-time rendering performance.

03

Shows robustness to noisy initializations.

Abstract

Reconstructing high-fidelity animatable 3D human avatars from monocular RGB videos remains challenging, particularly in unconstrained in-the-wild scenarios where camera parameters and human poses from off-the-shelf methods (e.g., COLMAP, HMR2.0) are often inaccurate. Splatting (3DGS) advances demonstrate impressive rendering quality and real-time performance, they critically depend on precise camera calibration and pose annotations, limiting their applicability in real-world settings. We present JOintGS, a unified framework that jointly optimizes camera extrinsics, human poses, and 3D Gaussian representations from coarse initialization through a synergistic refinement mechanism. Our key insight is that explicit foreground-background disentanglement enables mutual reinforcement: static background Gaussians anchor camera estimation via multi-view consistency; refined cameras improve human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · 3D Shape Modeling and Analysis · Advanced Vision and Imaging