AniMer: Animal Pose and Shape Estimation Using Family Aware Transformer

Jin Lyu; Tianyi Zhu; Yi Gu; Li Lin; Pujin Cheng; Yebin Liu; Xiaoying Tang; Liang An

arXiv:2412.00837·cs.CV·July 8, 2025

AniMer: Animal Pose and Shape Estimation Using Family Aware Transformer

Jin Lyu, Tianyi Zhu, Yi Gu, Li Lin, Pujin Cheng, Yebin Liu, Xiaoying Tang, Liang An

PDF

Open Access

TL;DR

AniMer is a novel Transformer-based framework that accurately estimates animal pose and shape across multiple species by leveraging a large synthetic dataset and a family-aware learning scheme, advancing animal behavior analysis.

Contribution

The paper introduces AniMer, a Transformer-based model with a family-aware contrastive learning scheme and a large-scale synthetic dataset, improving multi-species animal pose and shape estimation.

Findings

01

AniMer outperforms existing methods on multiple datasets.

02

The synthetic CtrlAni3D dataset enhances training diversity.

03

Ablation studies confirm the effectiveness of the proposed components.

Abstract

Quantitative analysis of animal behavior and biomechanics requires accurate animal pose and shape estimation across species, and is important for animal welfare and biological research. However, the small network capacity of previous methods and limited multi-species dataset leave this problem underexplored. To this end, this paper presents AniMer to estimate animal pose and shape using family aware Transformer, enhancing the reconstruction accuracy of diverse quadrupedal families. A key insight of AniMer is its integration of a high-capacity Transformer-based backbone and an animal family supervised contrastive learning scheme, unifying the discriminative understanding of various quadrupedal shapes within a single framework. For effective training, we aggregate most available open-sourced quadrupedal datasets, either with 3D or 2D labels. To improve the diversity of 3D labeled data, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Face recognition and analysis

MethodsByte Pair Encoding · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Residual Connection · Adam · Attention Is All You Need · Softmax · Label Smoothing · Dropout · Contrastive Learning