DiffPose-Animal: A Language-Conditioned Diffusion Framework for Animal Pose Estimation

Tianyu Xiong; Dayi Tan; Wei Tian

arXiv:2508.08783·cs.CV·December 16, 2025

DiffPose-Animal: A Language-Conditioned Diffusion Framework for Animal Pose Estimation

Tianyu Xiong, Dayi Tan, Wei Tian

PDF

Open Access

TL;DR

DiffPose-Animal introduces a diffusion-based, language-guided framework for animal pose estimation, effectively handling diverse species, occlusions, and limited data by integrating semantic priors and progressive refinement.

Contribution

It pioneers the use of diffusion models combined with language-derived semantic priors for top-down animal pose estimation, enhancing robustness and generalization.

Findings

01

Outperforms existing methods on public datasets.

02

Effectively handles occlusion and sparse annotations.

03

Demonstrates strong generalization across diverse species.

Abstract

Animal pose estimation is a fundamental task in computer vision, with growing importance in ecological monitoring, behavioral analysis, and intelligent livestock management. Compared to human pose estimation, animal pose estimation is more challenging due to high interspecies morphological diversity, complex body structures, and limited annotated data. In this work, we introduce DiffPose-Animal, a novel diffusion-based framework for top-down animal pose estimation. Unlike traditional heatmap regression methods, DiffPose-Animal reformulates pose estimation as a denoising process under the generative framework of diffusion models. To enhance semantic guidance during keypoint generation, we leverage large language models (LLMs) to extract both global anatomical priors and local keypoint-wise semantics based on species-specific prompts. These textual priors are encoded and fused with image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Robot Manipulation and Learning