Efficient Convolutional Neural Networks for Depth-Based Multi-Person Pose Estimation
Angel Mart\'inez-Gonz\'alez, Michael Villamizar, Olivier Can\'evet and, Jean-Marc Odobez

TL;DR
This paper develops fast, lightweight CNN architectures for multi-person 2D pose estimation using depth images, combining synthetic data, domain adaptation, and knowledge distillation to achieve accurate results efficiently.
Contribution
It introduces novel lightweight CNN designs for depth-based pose estimation, leveraging synthetic data, domain adaptation, and knowledge distillation to improve accuracy and speed.
Findings
Lightweight CNN architectures achieve competitive accuracy.
Synthetic and real data experiments validate the approach.
Knowledge distillation enhances model performance.
Abstract
Achieving robust multi-person 2D body landmark localization and pose estimation is essential for human behavior and interaction understanding as encountered for instance in HRI settings. Accurate methods have been proposed recently, but they usually rely on rather deep Convolutional Neural Network (CNN) architecture, thus requiring large computational and training resources. In this paper, we investigate different architectures and methodologies to address these issues and achieve fast and accurate multi-person 2D pose estimation. To foster speed, we propose to work with depth images, whose structure contains sufficient information about body landmarks while being simpler than textured color images and thus potentially requiring less complex CNNs for processing. In this context, we make the following contributions. i) we study several CNN architecture designs combining pose machines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsKnowledge Distillation
