From Camera to World: A Plug-and-Play Module for Human Mesh Transformation

Changhai Ma; Ziyu Wu; Yunkang Zhang; Qijun Ying; Boyan Liu; Xiaohui Cai

arXiv:2512.15212·cs.CV·December 18, 2025

From Camera to World: A Plug-and-Play Module for Human Mesh Transformation

Changhai Ma, Ziyu Wu, Yunkang Zhang, Qijun Ying, Boyan Liu, Xiaohui Cai

PDF

Open Access

TL;DR

This paper introduces Mesh-Plug, a modular approach that accurately transforms 3D human meshes from camera to world coordinates by estimating camera rotation from human-centered cues, improving over existing methods.

Contribution

The paper presents a novel plug-and-play module that estimates camera rotation using human body cues, enabling precise mesh transformation without environmental information.

Findings

01

Outperforms state-of-the-art on SPEC-SYN and SPEC-MTP datasets

02

Accurately estimates camera pitch angle from human body configurations

03

Refines mesh orientation and pose through integrated modules

Abstract

Reconstructing accurate 3D human meshes in the world coordinate system from in-the-wild images remains challenging due to the lack of camera rotation information. While existing methods achieve promising results in the camera coordinate system by assuming zero camera rotation, this simplification leads to significant errors when transforming the reconstructed mesh to the world coordinate system. To address this challenge, we propose Mesh-Plug, a plug-and-play module that accurately transforms human meshes from camera coordinates to world coordinates. Our key innovation lies in a human-centered approach that leverages both RGB images and depth maps rendered from the initial mesh to estimate camera rotation parameters, eliminating the dependency on environmental cues. Specifically, we first train a camera rotation prediction module that focuses on the human body's spatial configuration to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · 3D Shape Modeling and Analysis · Advanced Vision and Imaging