Human Pose as Compositional Tokens
Zigang Geng, Chunyu Wang, Yixuan Wei, Ze Liu, Houqiang Li, and Han Hu

TL;DR
This paper introduces Pose as Compositional Tokens (PCT), a structured representation of human pose that models joint dependencies, enabling more accurate and occlusion-robust pose estimation through a classification approach.
Contribution
It proposes a novel compositional token-based pose representation and formulates pose estimation as a classification task, improving accuracy and robustness over existing methods.
Findings
Achieves comparable or better pose estimation accuracy.
Performs well under occlusion conditions.
Uses a classification-based approach for pose estimation.
Abstract
Human pose is typically represented by a coordinate vector of body joints or their heatmap embeddings. While easy for data processing, unrealistic pose estimates are admitted due to the lack of dependency modeling between the body joints. In this paper, we present a structured representation, named Pose as Compositional Tokens (PCT), to explore the joint dependency. It represents a pose by M discrete tokens with each characterizing a sub-structure with several interdependent joints. The compositional design enables it to achieve a small reconstruction error at a low cost. Then we cast pose estimation as a classification task. In particular, we learn a classifier to predict the categories of the M tokens from an image. A pre-learned decoder network is used to recover the pose from the tokens without further post-processing. We show that it achieves better or comparable pose estimation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Gait Recognition and Analysis
MethodsHeatmap
