TL;DR
ARTS introduces a semi-analytical approach utilizing disentangled skeletal representations to improve 3D human mesh recovery from videos, achieving higher accuracy and temporal consistency than previous methods.
Contribution
The paper proposes a novel semi-analytical regressor with skeletal disentanglement and modules for pose, shape, and motion refinement, advancing video-based human mesh recovery.
Findings
Outperforms state-of-the-art methods on 3DPW, MPI-INF-3DHP, and Human3.6M benchmarks.
Achieves higher per-frame accuracy and better temporal consistency.
Effectively decouples pose, shape, and motion for improved estimation.
Abstract
Although existing video-based 3D human mesh recovery methods have made significant progress, simultaneously estimating human pose and shape from low-resolution image features limits their performance. These image features lack sufficient spatial information about the human body and contain various noises (e.g., background, lighting, and clothing), which often results in inaccurate pose and inconsistent motion. Inspired by the rapid advance in human pose estimation, we discover that compared to image features, skeletons inherently contain accurate human pose and motion. Therefore, we propose a novel semiAnalytical Regressor using disenTangled Skeletal representations for human mesh recovery from videos, called ARTS. Specifically, a skeleton estimation and disentanglement module is proposed to estimate the 3D skeletons from a video and decouple them into disentangled skeletal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
