OTPose: Occlusion-Aware Transformer for Pose Estimation in Sparsely-Labeled Videos
Kyung-Min Jin, Gun-Hee Lee, Seong-Whan Lee

TL;DR
OTPose introduces an occlusion-aware transformer framework that effectively estimates multi-human poses in sparsely-labeled videos, handling occlusion and motion blur with semi-supervised learning and temporal encoding.
Contribution
The paper presents a novel attention mask for occlusion handling and employs transformers for temporal dependency encoding in pose estimation from sparse video annotations.
Findings
State-of-the-art results on PoseTrack2017 and PoseTrack2018 datasets.
Robustness to occlusion and motion blur in sparse annotations.
Effective semi-supervised occlusion-aware heatmaps.
Abstract
Although many approaches for multi-human pose estimation in videos have shown profound results, they require densely annotated data which entails excessive man labor. Furthermore, there exists occlusion and motion blur that inevitably lead to poor estimation performance. To address these problems, we propose a method that leverages an attention mask for occluded joints and encodes temporal dependency between frames using transformers. First, our framework composes different combinations of sparsely annotated frames that denote the track of the overall joint movement. We propose an occlusion attention mask from these combinations that enable encoding occlusion-aware heatmaps as a semi-supervised task. Second, the proposed temporal encoder employs transformer architecture to effectively aggregate the temporal relationship and keypoint-wise attention from each time step and accurately…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Diabetic Foot Ulcer Assessment and Management · Advanced Vision and Imaging
