AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation

Xu Cao; Xiaoye Li; Liya Ma; Yi Huang; Xuan Feng; Zening Chen; Hongwu; Zeng; Jianguo Cao

arXiv:2205.05277·cs.CV·December 27, 2022

AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation

Xu Cao, Xiaoye Li, Liya Ma, Yi Huang, Xuan Feng, Zening Chen, Hongwu, Zeng, Jianguo Cao

PDF

1 Repo

TL;DR

This paper introduces AggPose, a novel transformer-based model for infant pose estimation, along with a new infant pose dataset, demonstrating significant performance improvements over existing models.

Contribution

The paper presents AggPose, a transformer-based framework for infant pose estimation, and provides a large-scale infant pose dataset to address the lack of benchmarks.

Findings

01

AggPose outperforms HRFormer and TokenPose on infant pose dataset.

02

AggPose improves COCO pose estimation AP by 0.8 over HRFormer.

03

The model effectively learns multi-scale features for infant pose estimation.

Abstract

Movement and pose assessment of newborns lets experienced pediatricians predict neurodevelopmental disorders, allowing early intervention for related diseases. However, most of the newest AI approaches for human pose estimation methods focus on adults, lacking publicly benchmark for infant pose estimation. In this paper, we fill this gap by proposing infant pose dataset and Deep Aggregation Vision Transformer for human pose estimation, which introduces a fast trained full transformer framework without using convolution operations to extract features in the early stages. It generalizes Transformer + MLP to high-resolution deep layer aggregation within feature maps, thus enabling information fusion between different vision levels. We pre-train AggPose on COCO pose dataset and apply it on our newly released large-scale infant pose estimation dataset. The results show that AggPose could…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

szar-lab/aggpose
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Adam · Byte Pair Encoding · Absolute Position Encodings · Multi-Head Attention · Residual Connection · Dense Connections · Label Smoothing · Vision Transformer