Faster Than Real-time Facial Alignment: A 3D Spatial Transformer Network Approach in Unconstrained Poses
Chandrasekhar Bhagavatula, Chenchen Zhu, Khoa Luu, Marios Savvides

TL;DR
This paper introduces a fast, end-to-end 3D facial alignment method using a 3D Spatial Transformer Network that handles unconstrained poses and achieves real-time performance with state-of-the-art accuracy.
Contribution
The authors propose a novel 3D Spatial Transformer Network that simultaneously estimates 3D face shape and 2D landmarks from a single image, trained solely on synthetic data.
Findings
Achieves faster than real-time facial alignment.
Outperforms existing 3D alignment methods on benchmark datasets.
Operates effectively on unconstrained poses.
Abstract
Facial alignment involves finding a set of landmark points on an image with a known semantic meaning. However, this semantic meaning of landmark points is often lost in 2D approaches where landmarks are either moved to visible boundaries or ignored as the pose of the face changes. In order to extract consistent alignment points across large poses, the 3D structure of the face must be considered in the alignment step. However, extracting a 3D structure from a single 2D image usually requires alignment in the first place. We present our novel approach to simultaneously extract the 3D shape of the face and the semantically consistent 2D alignment through a 3D Spatial Transformer Network (3DSTN) to model both the camera projection matrix and the warping parameters of a 3D model. By utilizing a generic 3D model and a Thin Plate Spline (TPS) warping function, we are able to generate subject…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
