A Recurrent Encoder-Decoder Network for Sequential Face Alignment
Xi Peng, Rogerio S. Feris, Xiaoyu Wang, Dimitris N. Metaxas

TL;DR
This paper introduces a recurrent encoder-decoder network for real-time video face alignment, leveraging spatial and temporal recurrent learning to improve accuracy and generalization in facial point detection.
Contribution
It presents a novel model that combines spatial feedback loops and temporal feature decoupling with recurrent learning for enhanced face alignment.
Findings
Outperforms state-of-the-art methods on standard datasets.
Enables iterative coarse-to-fine alignment with a single network.
Improves generalization by decoupling pose/expression from identity.
Abstract
We propose a novel recurrent encoder-decoder network model for real-time video-based face alignment. Our proposed model predicts 2D facial point maps regularized by a regression loss, while uniquely exploiting recurrent learning at both spatial and temporal dimensions. At the spatial level, we add a feedback loop connection between the combined output response map and the input, in order to enable iterative coarse-to-fine face alignment using a single network model. At the temporal level, we first decouple the features in the bottleneck of the network into temporal-variant factors, such as pose and expression, and temporal-invariant factors, such as identity information. Temporal recurrent learning is then applied to the decoupled temporal-variant features, yielding better generalization and significantly more accurate results at test time. We perform a comprehensive experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Face and Expression Recognition
