TL;DR
This paper introduces a multi-task deep learning model for head pose estimation, face alignment, and visibility detection in unconstrained images, leveraging task dependencies to improve overall accuracy.
Contribution
A novel encoder-decoder CNN architecture with residual blocks and strategic task placement that enhances multi-task head pose, alignment, and visibility estimation.
Findings
Outperforms state-of-the-art in head pose and visibility tasks
Achieves face alignment results comparable to the best methods
Utilizes task dependencies to boost overall performance
Abstract
We present a deep learning-based multi-task approach for head pose estimation in images. We contribute with a network architecture and training strategy that harness the strong dependencies among face pose, alignment and visibility, to produce a top performing model for all three tasks. Our architecture is an encoder-decoder CNN with residual blocks and lateral skip connections. We show that the combination of head pose estimation and landmark-based face alignment significantly improve the performance of the former task. Further, the location of the pose task at the bottleneck layer, at the end of the encoder, and that of tasks depending on spatial information, such as visibility and alignment, in the final decoder layer, also contribute to increase the final performance. In the experiments conducted the proposed model outperforms the state-of-the-art in the face pose and visibility…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
