TL;DR
This paper introduces AggPose, a novel transformer-based model for infant pose estimation, along with a new infant pose dataset, demonstrating significant performance improvements over existing models.
Contribution
The paper presents AggPose, a transformer-based framework for infant pose estimation, and provides a large-scale infant pose dataset to address the lack of benchmarks.
Findings
AggPose outperforms HRFormer and TokenPose on infant pose dataset.
AggPose improves COCO pose estimation AP by 0.8 over HRFormer.
The model effectively learns multi-scale features for infant pose estimation.
Abstract
Movement and pose assessment of newborns lets experienced pediatricians predict neurodevelopmental disorders, allowing early intervention for related diseases. However, most of the newest AI approaches for human pose estimation methods focus on adults, lacking publicly benchmark for infant pose estimation. In this paper, we fill this gap by proposing infant pose dataset and Deep Aggregation Vision Transformer for human pose estimation, which introduces a fast trained full transformer framework without using convolution operations to extract features in the early stages. It generalizes Transformer + MLP to high-resolution deep layer aggregation within feature maps, thus enabling information fusion between different vision levels. We pre-train AggPose on COCO pose dataset and apply it on our newly released large-scale infant pose estimation dataset. The results show that AggPose could…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Adam · Byte Pair Encoding · Absolute Position Encodings · Multi-Head Attention · Residual Connection · Dense Connections · Label Smoothing · Vision Transformer
