SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation
Wanqi Yin, Zhongang Cai, Ruisi Wang, Ailing Zeng, Chen Wei, Qingping, Sun, Haiyi Mei, Yanjun Wang, Hui En Pang, Mingyuan Zhang, Lei Zhang, Chen, Change Loy, Atsushi Yamashita, Lei Yang, Ziwei Liu

TL;DR
This paper explores scaling up expressive human pose and shape estimation models using large datasets and vision transformers, achieving state-of-the-art results across multiple benchmarks.
Contribution
It introduces a systematic approach to data and model scaling for EHPS, demonstrating significant performance improvements with large datasets and transformer architectures.
Findings
Achieved diminishing returns at 10 million training instances.
Large models and data lead to strong performance and transferability.
State-of-the-art results on seven benchmarks, including a new hand dataset.
Abstract
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications. Despite encouraging progress, current state-of-the-art methods focus on training innovative architectural designs on confined datasets. In this work, we investigate the impact of scaling up EHPS towards a family of generalist foundation models. 1) For data scaling, we perform a systematic investigation on 40 EHPS datasets, encompassing a wide range of scenarios that a model trained on any single dataset cannot handle. More importantly, capitalizing on insights obtained from the extensive benchmarking process, we optimize our training scheme and select datasets that lead to a significant leap in EHPS capabilities. Ultimately, we achieve diminishing returns at 10M training instances from diverse data sources. 2) For model scaling, we take advantage of vision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Hand Gesture Recognition Systems
MethodsBalanced Selection · Focus
