TL;DR
OpenPifPaf introduces a real-time, single-stage neural network framework that detects and tracks semantic keypoints across space and time, suitable for diverse perception tasks including human pose, vehicle, and animal keypoints.
Contribution
The paper presents a novel neural network architecture using Composite Fields for joint spatio-temporal keypoint detection and association, enabling real-time performance.
Findings
Achieves competitive accuracy on COCO, CrowdPose, and PoseTrack datasets.
Operates at an order of magnitude faster than previous methods.
Generalizes to various semantic keypoints beyond humans.
Abstract
Many image-based perception tasks can be formulated as detecting, associating and tracking semantic keypoints, e.g., human body pose estimation and tracking. In this work, we present a general framework that jointly detects and forms spatio-temporal keypoint associations in a single stage, making this the first real-time pose detection and tracking algorithm. We present a generic neural network architecture that uses Composite Fields to detect and construct a spatio-temporal pose which is a single, connected graph whose nodes are the semantic keypoints (e.g., a person's body joints) in multiple frames. For the temporal associations, we introduce the Temporal Composite Association Field (TCAF) which requires an extended network architecture and training method beyond previous Composite Fields. Our experiments show competitive accuracy while being an order of magnitude faster on multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsComposite Fields
