TL;DR
Anny-Fit is a novel multi-person optimization framework that jointly recovers 3D human meshes across all ages in a scene, leveraging multiple signals and semantic knowledge for improved accuracy and coherence.
Contribution
It introduces a joint optimization approach for all-age 3D human mesh recovery, integrating diverse signals and semantic attributes to handle real-world scenes.
Findings
Improves 2D reprojection accuracy by 13 to 16
Enhances relative depth ordering by 6 to 7
Reduces 3D estimation error by 9 to 29
Abstract
Recovering 3D human pose and shape from a single image remains a cornerstone of human-centric vision, yet most methods assume adult subjects and optimize each person independently. These assumptions fail in real-world, all-age scenes, where body proportions and depth must be resolved jointly. We introduce Anny-Fit, a multi-person, camera-space optimization framework for all-age 3D human mesh recovery (HMR). Unlike existing per-person fitting methods, Anny-Fit jointly optimizes all individuals directly in the camera coordinate system, enforcing global spatial consistency. At the core of our approach is the use of multiple forms of expert knowledge -- including metric depth maps, instance segmentation, 2D keypoints, and, VLM-derived semantic attributes such as age and gender -- each obtained from dedicated off-the-shelf networks. These complementary signals jointly guide the optimization,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
