LAMP: Leveraging Language Prompts for Multi-person Pose Estimation
Shengnan Hu, Ce Zheng, Zixiang Zhou, Chen Chen, and Gita Sukthankar

TL;DR
LAMP introduces a prompt-based approach leveraging language models like CLIP to improve multi-person pose estimation in crowded scenes, effectively handling occlusions and enhancing robustness over traditional methods.
Contribution
The paper presents a novel prompt-based pose inference strategy using language representations to improve multi-person pose estimation, addressing occlusion and instance separation challenges.
Findings
Language-supervised training boosts pose estimation performance.
Instance-level and joint-level prompts are both valuable.
LAMP achieves more robust pose understanding in crowded scenes.
Abstract
Human-centric visual understanding is an important desideratum for effective human-robot interaction. In order to navigate crowded public places, social robots must be able to interpret the activity of the surrounding humans. This paper addresses one key aspect of human-centric visual understanding, multi-person pose estimation. Achieving good performance on multi-person pose estimation in crowded scenes is difficult due to the challenges of occluded joints and instance separation. In order to tackle these challenges and overcome the limitations of image features in representing invisible body parts, we propose a novel prompt-based pose inference strategy called LAMP (Language Assisted Multi-person Pose estimation). By utilizing the text representations generated by a well-trained language model (CLIP), LAMP can facilitate the understanding of poses on the instance and joint levels, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Hand Gesture Recognition Systems
