On the power of data augmentation for head pose estimation

Michael Welter

arXiv:2407.05357·cs.CV·October 17, 2024

On the power of data augmentation for head pose estimation

Michael Welter

PDF

Open Access 1 Repo

TL;DR

This paper explores how data augmentation and synthesis can improve deep learning models for head pose estimation, introducing a new multitask loss with uncertainty estimation to enhance accuracy and efficiency.

Contribution

It proposes a novel multitask head/loss design with uncertainty estimation and demonstrates improved head pose estimation performance using augmented and synthesized data.

Findings

01

Models are small and efficient.

02

Achieve competitive accuracy in 6 DoF pose estimation.

03

Enhanced performance with data augmentation strategies.

Abstract

Deep learning has been impressively successful in the last decade in predicting human head poses from monocular images. However, for in-the-wild inputs the research community relies predominantly on a single training set, 300W-LP, of semisynthetic nature without many alternatives. This paper focuses on gradual extension and improvement of the data to explore the performance achievable with augmentation and synthesis strategies further. Modeling-wise a novel multitask head/loss design which includes uncertainty estimation is proposed. Overall, the thus obtained models are small, efficient, suitable for full 6 DoF pose estimation, and exhibit very competitive accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

opentrack/neuralnet-tracker-traincode
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Human Pose and Action Recognition · 3D Shape Modeling and Analysis

MethodsSparse Evolutionary Training