Automatic infant 2D pose estimation from videos: comparing seven deep neural network methods
Filipe Gama, Matej Misar, Lukas Navara, Sergiu T. Popescu, Matej Hoffmann

TL;DR
This study evaluates and compares seven deep learning-based methods for automatic 2D infant pose estimation from videos, highlighting ViTPose's superior performance and providing insights into their reliability and real-time capabilities.
Contribution
It systematically assesses seven popular pose estimation methods on infant videos, introduces new error metrics, and offers practical tools for implementation and further research.
Findings
ViTPose performs best among tested methods.
Most methods are competitive without fine-tuning.
AlphaPose achieves near real-time performance.
Abstract
Automatic markerless estimation of infant posture and motion from ordinary videos carries great potential for movement studies "in the wild", facilitating understanding of motor development and massively increasing the chances of early diagnosis of disorders. There is rapid development of human pose estimation methods in computer vision thanks to advances in deep learning and machine learning. However, these methods are trained on datasets that feature adults in different contexts. This work tests and compares seven popular methods (AlphaPose, DeepLabCut/DeeperCut, Detectron2, HRNet, MediaPipe/BlazePose, OpenPose, and ViTPose) on videos of infants in supine position and in more complex settings. Surprisingly, all methods except DeepLabCut and MediaPipe have competitive performance without additional finetuning, with ViTPose performing best. Next to standard performance metrics (average…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Social Robot Interaction and HRI · Hand Gesture Recognition Systems
MethodsResidual Connection · Convolution · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · HRNet · OpenPose
