Learning Structure-Supporting Dependencies via Keypoint Interactive   Transformer for General Mammal Pose Estimation

Tianyang Xu; Jiyong Rao; Xiaoning Song; Zhenhua Feng; Xiao-Jun Wu

arXiv:2502.18214·cs.CV·February 26, 2025

Learning Structure-Supporting Dependencies via Keypoint Interactive Transformer for General Mammal Pose Estimation

Tianyang Xu, Jiyong Rao, Xiaoning Song, Zhenhua Feng, Xiao-Jun Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Keypoint Interactive Transformer (KIT) that learns structural dependencies among keypoints to improve general mammal pose estimation across diverse species with high appearance and pose variability.

Contribution

The paper proposes a novel KIT model with instance-level structure-supporting dependencies and a keypoint clustering method for body part bias, advancing general mammal pose estimation.

Findings

01

Effective handling of appearance and pose variances across species

02

Improved accuracy in mammal pose estimation tasks

03

Robustness to keypoint imbalance issues

Abstract

General mammal pose estimation is an important and challenging task in computer vision, which is essential for understanding mammal behaviour in real-world applications. However, existing studies are at their preliminary research stage, which focus on addressing the problem for only a few specific mammal species. In principle, from specific to general mammal pose estimation, the biggest issue is how to address the huge appearance and pose variances for different species. We argue that given appearance context, instance-level prior and the structural relation among keypoints can serve as complementary evidence. To this end, we propose a Keypoint Interactive Transformer (KIT) to learn instance-level structure-supporting dependencies for general mammal pose estimation. Specifically, our KITPose consists of two coupled components. The first component is to extract keypoint features and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Raojiyong/KITPose
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Robotic Locomotion and Control · Robot Manipulation and Learning

MethodsAttention Is All You Need · Absolute Position Encodings · Linear Layer · Layer Normalization · Byte Pair Encoding · Dense Connections · Residual Connection · Label Smoothing · Multi-Head Attention · Position-Wise Feed-Forward Layer