TL;DR
This paper introduces a lightweight joint relation extractor for explicit modeling of joint relationships in video-based human pose estimation, improving occlusion handling and achieving state-of-the-art results.
Contribution
The paper proposes a novel joint relation extractor (JRE) that explicitly models joint relationships and infers occluded joints, enhancing pose estimation accuracy in videos.
Findings
Achieves state-of-the-art results on Penn Action, Sub-JHMDB, and PoseTrack2018 datasets.
JRE improves performance of backbone models on COCO2017 dataset.
Model effectively infers occluded joints using learned joint relationships.
Abstract
Video-based human pose estimation (VHPE) is a vital yet challenging task. While deep learning methods have made significant progress for the VHPE, most approaches to this task implicitly model the long-range interaction between joints by enlarging the receptive field of the convolution. Unlike prior methods, we design a lightweight and plug-and-play joint relation extractor (JRE) to model the associative relationship between joints explicitly and automatically. The JRE takes the pseudo heatmaps of joints as input and calculates the similarity between pseudo heatmaps. In this way, the JRE flexibly learns the relationship between any two joints, allowing it to learn the rich spatial configuration of human poses. Moreover, the JRE can infer invisible joints according to the relationship between joints, which is beneficial for the model to locate occluded joints. Then, combined with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsHeatmap
