Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation
Yihan Wang, Muyang Li, Han Cai, Wei-Ming Chen, and Song Han

TL;DR
LitePose is a novel, efficient architecture for 2D human pose estimation that reduces computational cost and latency on edge devices while maintaining high accuracy, by removing redundant high-resolution branches and introducing capacity-enhancing techniques.
Contribution
The paper introduces LitePose, a single-branch architecture for pose estimation, and proposes Fusion Deconv Head and Large Kernel Convs to improve efficiency and capacity, outperforming prior models.
Findings
LitePose reduces latency by up to 5x on mobile devices.
7x7 kernels improve accuracy by +14.0 mAP over 3x3 kernels.
Removing high-resolution branches enhances efficiency and performance.
Abstract
Pose estimation plays a critical role in human-centered vision applications. However, it is difficult to deploy state-of-the-art HRNet-based pose estimation models on resource-constrained edge devices due to the high computational cost (more than 150 GMACs per frame). In this paper, we study efficient architecture design for real-time multi-person pose estimation on edge. We reveal that HRNet's high-resolution branches are redundant for models at the low-computation region via our gradual shrinking experiments. Removing them improves both efficiency and performance. Inspired by this finding, we design LitePose, an efficient single-branch architecture for pose estimation, and introduce two simple approaches to enhance the capacity of LitePose, including Fusion Deconv Head and Large Kernel Convs. Fusion Deconv Head removes the redundancy in high-resolution branches, allowing scale-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation· youtube
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Hand Gesture Recognition Systems
