DiffPose-Animal: A Language-Conditioned Diffusion Framework for Animal Pose Estimation
Tianyu Xiong, Dayi Tan, Wei Tian

TL;DR
DiffPose-Animal introduces a diffusion-based, language-guided framework for animal pose estimation, effectively handling diverse species, occlusions, and limited data by integrating semantic priors and progressive refinement.
Contribution
It pioneers the use of diffusion models combined with language-derived semantic priors for top-down animal pose estimation, enhancing robustness and generalization.
Findings
Outperforms existing methods on public datasets.
Effectively handles occlusion and sparse annotations.
Demonstrates strong generalization across diverse species.
Abstract
Animal pose estimation is a fundamental task in computer vision, with growing importance in ecological monitoring, behavioral analysis, and intelligent livestock management. Compared to human pose estimation, animal pose estimation is more challenging due to high interspecies morphological diversity, complex body structures, and limited annotated data. In this work, we introduce DiffPose-Animal, a novel diffusion-based framework for top-down animal pose estimation. Unlike traditional heatmap regression methods, DiffPose-Animal reformulates pose estimation as a denoising process under the generative framework of diffusion models. To enhance semantic guidance during keypoint generation, we leverage large language models (LLMs) to extract both global anatomical priors and local keypoint-wise semantics based on species-specific prompts. These textual priors are encoded and fused with image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Robot Manipulation and Learning
