AP-10K: A Benchmark for Animal Pose Estimation in the Wild
Hang Yu, Yufei Xu, Jing Zhang, Wei Zhao, Ziyu Guan, Dacheng Tao

TL;DR
AP-10K is a comprehensive large-scale benchmark dataset designed to improve animal pose estimation across diverse species, enabling better generalization and transfer learning in wildlife research.
Contribution
This paper introduces AP-10K, the first extensive benchmark dataset for mammal pose estimation across multiple species, supporting diverse evaluation tracks and advancing research in the field.
Findings
Diverse animal species improve pose estimation accuracy.
Transfer learning from human to animal pose estimation is effective.
Models generalize better with intra- and inter-family training.
Abstract
Accurate animal pose estimation is an essential step towards understanding animal behavior, and can potentially benefit many downstream applications, such as wildlife conservation. Previous works only focus on specific animals while ignoring the diversity of animal species, limiting the generalization ability. In this paper, we propose AP-10K, the first large-scale benchmark for mammal animal pose estimation, to facilitate the research in animal pose estimation. AP-10K consists of 10,015 images collected and filtered from 23 animal families and 54 species following the taxonomic rank and high-quality keypoint annotations labeled and checked manually. Based on AP-10K, we benchmark representative pose estimation models on the following three tracks: (1) supervised learning for animal pose estimation, (2) cross-domain transfer learning from human pose estimation to animal pose estimation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
