End-to-End Human Pose and Mesh Reconstruction with Transformers
Kevin Lin, Lijuan Wang, Zicheng Liu

TL;DR
METRO introduces a transformer-based approach for 3D human pose and mesh reconstruction from a single image, outperforming existing methods and extending to hand reconstruction without relying on parametric models.
Contribution
The paper proposes a transformer encoder for joint modeling of vertices and joints, enabling non-parametric, flexible 3D reconstruction of humans and hands from images.
Findings
Achieves state-of-the-art results on Human3.6M and 3DPW datasets.
Outperforms existing methods on the FreiHAND hand dataset.
Robust to partial occlusions with masked vertex modeling.
Abstract
We present a new method, called MEsh TRansfOrmer (METRO), to reconstruct 3D human pose and mesh vertices from a single image. Our method uses a transformer encoder to jointly model vertex-vertex and vertex-joint interactions, and outputs 3D joint coordinates and mesh vertices simultaneously. Compared to existing techniques that regress pose and shape parameters, METRO does not rely on any parametric mesh models like SMPL, thus it can be easily extended to other objects such as hands. We further relax the mesh topology and allow the transformer self-attention mechanism to freely attend between any two vertices, making it possible to learn non-local relationships among mesh vertices and joints. With the proposed masked vertex modeling, our method is more robust and effective in handling challenging situations like partial occlusions. METRO generates new state-of-the-art results for human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · 3D Shape Modeling and Analysis · Advanced Neural Network Applications
