CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models
Junho Kim, Hyungjin Chung, Byung-Hoon Kim

TL;DR
CapeLLM introduces a multimodal large language model for category-agnostic pose estimation that uses only query images and text descriptions, outperforming previous methods and enhancing robustness and reasoning capabilities.
Contribution
The paper presents the first MLLM-based approach for CAPE, eliminating support images and leveraging rich priors in large language models for improved pose estimation.
Findings
Sets new state-of-the-art on MP-100 benchmark in 1-shot and 5-shot settings.
Effectively models spatial distribution and uncertainty of unseen keypoints.
Demonstrates robustness across input variations.
Abstract
Category-agnostic pose estimation (CAPE) has traditionally relied on support images with annotated keypoints, a process that is often cumbersome and may fail to fully capture the necessary correspondences across diverse object categories. Recent efforts have explored the use of text queries, leveraging their enhanced stability and generalization capabilities. However, existing approaches often remain constrained by their reliance on support queries, their failure to fully utilize the rich priors embedded in pre-trained large language models, and the limitations imposed by their parametric distribution assumptions. To address these challenges, we introduce CapeLLM, the first multimodal large language model (MLLM) designed for CAPE. Our method only employs query image and detailed text descriptions as an input to estimate category-agnostic keypoints. Our method encompasses effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification
