Recognition of Freely Selected Keypoints on Human Limbs
Katja Ludwig, Daniel Kienzle, Rainer Lienhart

TL;DR
This paper introduces a method using Vision Transformers to detect arbitrary keypoints on human limbs, extending beyond fixed keypoints in standard pose estimation datasets, enabling more flexible and detailed human pose analysis.
Contribution
It proposes two novel approaches to encode arbitrary limb keypoints within a Transformer-based architecture, allowing detection without retraining on new keypoints.
Findings
Achieves similar accuracy to TokenPose on fixed keypoints
Capable of detecting arbitrary limb keypoints
Does not require retraining for new keypoints
Abstract
Nearly all Human Pose Estimation (HPE) datasets consist of a fixed set of keypoints. Standard HPE models trained on such datasets can only detect these keypoints. If more points are desired, they have to be manually annotated and the model needs to be retrained. Our approach leverages the Vision Transformer architecture to extend the capability of the model to detect arbitrary keypoints on the limbs of persons. We propose two different approaches to encode the desired keypoints. (1) Each keypoint is defined by its position along the line between the two enclosing keypoints from the fixed set and its relative distance between this line and the edge of the limb. (2) Keypoints are defined as coordinates on a norm pose. Both approaches are based on the TokenPose architecture, while the keypoint tokens that correspond to the fixed keypoints are replaced with our novel module. Experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Gait Recognition and Analysis
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Dense Connections · Multi-Head Attention · Layer Normalization · Residual Connection · Softmax
