PoseMoE: Mixture-of-Experts Network for Monocular 3D Human Pose Estimation
Mengyuan Liu, Jiajie Liu, Jinyan Zhang, Wenhao Li, Junsong Yuan

TL;DR
PoseMoE introduces a mixture-of-experts network that disentangles 2D pose and depth features, improving monocular 3D human pose estimation accuracy by reducing the influence of uncertain depth features.
Contribution
The paper proposes PoseMoE, a novel mixture-of-experts architecture that separately refines 2D pose and depth features, with a cross-expert module for better feature aggregation, addressing limitations of previous entangled encoding methods.
Findings
Outperforms existing lifting-based methods on Human3.6M, MPI-INF-3DHP, and 3DPW datasets.
Effectively disentangles 2D pose and depth features, reducing depth uncertainty impact.
Enhances feature representation through cross-expert spatio-temporal aggregation.
Abstract
The lifting-based methods have dominated monocular 3D human pose estimation by leveraging detected 2D poses as intermediate representations. The 2D component of the final 3D human pose benefits from the detected 2D poses, whereas its depth counterpart must be estimated from scratch. The lifting-based methods encode the detected 2D pose and unknown depth in an entangled feature space, explicitly introducing depth uncertainty to the detected 2D pose, thereby limiting overall estimation accuracy. This work reveals that the depth representation is pivotal for the estimation process. Specifically, when depth is in an initial, completely unknown state, jointly encoding depth features with 2D pose features is detrimental to the estimation process. In contrast, when depth is initially refined to a more dependable state via network-based estimation, encoding it together with 2D pose information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Human Motion and Animation
