Detecting Arbitrary Keypoints on Limbs and Skis with Sparse Partly Correct Segmentation Masks
Katja Ludwig, Daniel Kienzle, Julian Lorenz, Rainer Lienhart

TL;DR
This paper introduces a Vision Transformer-based method for detecting arbitrary keypoints on limbs and skis of ski jumpers using only a few partly correct segmentation masks, reducing annotation costs.
Contribution
It presents a novel approach that leverages partly correct segmentation masks for training, enabling flexible keypoint detection without extensive manual annotations.
Findings
Few partly correct segmentation masks suffice for training.
The method effectively detects arbitrary keypoints on limbs and skis.
Training techniques like pseudo labels improve detection performance.
Abstract
Analyses based on the body posture are crucial for top-class athletes in many sports disciplines. If at all, coaches label only the most important keypoints, since manual annotations are very costly. This paper proposes a method to detect arbitrary keypoints on the limbs and skis of professional ski jumpers that requires a few, only partly correct segmentation masks during training. Our model is based on the Vision Transformer architecture with a special design for the input tokens to query for the desired keypoints. Since we use segmentation masks only to generate ground truth labels for the freely selectable keypoints, partly correct segmentation masks are sufficient for our training procedure. Hence, there is no need for costly hand-annotated segmentation masks. We analyze different training techniques for freely selected and standard keypoints, including pseudo labels, and show in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Detecting Arbitrary Keypoints on Limbs and Skis with Sparse Partly Correct Segmentation Masks· youtube
Taxonomy
TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Advanced Neural Network Applications
MethodsMulti-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Label Smoothing · Linear Layer · Adam · Softmax · Absolute Position Encodings · Dropout · Byte Pair Encoding
