Referring Human Pose and Mask Estimation in the Wild
Bo Miao, Mingtao Feng, Zijie Wu, Mohammed Bennamoun, Yongsheng Gao,, Ajmal Mian

TL;DR
This paper introduces R-HPM, a new task for identifying and estimating human pose and masks based on text or positional prompts, supported by a large annotated dataset and a novel promptable model, UniPHD.
Contribution
The paper presents the first end-to-end promptable approach for R-HPM and introduces the RefHuman dataset with extensive annotations for this task.
Findings
UniPHD achieves high-quality, prompt-conditioned human pose and mask estimation.
RefHuman dataset contains over 50,000 annotated instances with prompts, keypoints, and masks.
The approach outperforms existing methods on benchmark datasets.
Abstract
We introduce Referring Human Pose and Mask Estimation (R-HPM) in the wild, where either a text or positional prompt specifies the person of interest in an image. This new task holds significant potential for human-centric applications such as assistive robotics and sports analysis. In contrast to previous works, R-HPM (i) ensures high-quality, identity-aware results corresponding to the referred person, and (ii) simultaneously predicts human pose and mask for a comprehensive representation. To achieve this, we introduce a large-scale dataset named RefHuman, which substantially extends the MS COCO dataset with additional text and positional prompt annotations. RefHuman includes over 50,000 annotated instances in the wild, each equipped with keypoint, mask, and prompt annotations. To enable prompt-conditioned estimation, we propose the first end-to-end promptable approach named UniPHD for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Video Surveillance and Tracking Methods
