Efficient 2.5D Hand Pose Estimation via Auxiliary Multi-Task Training   for Embedded Devices

Prajwal Chidananda; Ayan Sinha; Adithya Rao; Douglas Lee; Andrew; Rabinovich (Magic Leap; Inc)

arXiv:1909.05897·cs.CV·September 16, 2019

Efficient 2.5D Hand Pose Estimation via Auxiliary Multi-Task Training for Embedded Devices

Prajwal Chidananda, Ayan Sinha, Adithya Rao, Douglas Lee, Andrew, Rabinovich (Magic Leap, Inc)

PDF

Open Access

TL;DR

This paper presents a highly efficient 2.5D hand pose estimation method optimized for embedded devices, using a lightweight network and multi-task training to achieve real-time performance with minimal memory and computation.

Contribution

The authors introduce a compact network architecture and auxiliary multi-task training strategy that enable accurate 2.5D hand pose estimation on resource-constrained embedded devices.

Findings

01

Achieves over 50 Hz inference speed on embedded hardware.

02

Uses less than 35 MFLOPs and 300 KB memory footprint.

03

Maintains performance comparable to larger models like MobileNetV2.

Abstract

2D Key-point estimation is an important precursor to 3D pose estimation problems for human body and hands. In this work, we discuss the data, architecture, and training procedure necessary to deploy extremely efficient 2.5D hand pose estimation on embedded devices with highly constrained memory and compute envelope, such as AR/VR wearables. Our 2.5D hand pose estimation consists of 2D key-point estimation of joint positions on an egocentric image, captured by a depth sensor, and lifted to 2.5D using the corresponding depth values. Our contributions are two fold: (a) We discuss data labeling and augmentation strategies, the modules in the network architecture that collectively lead to $3%$ the flop count and $2%$ the number of parameters when compared to the state of the art MobileNetV2 architecture. (b) We propose an auxiliary multi-task training strategy needed to compensate for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Advanced Neural Network Applications

MethodsDepthwise Convolution · Pointwise Convolution · Depthwise Separable Convolution · Batch Normalization · Inverted Residual Block · Average Pooling · 1x1 Convolution · Convolution · Tether Customer Service Number +1-833-534-1729