RED-Net: A Recurrent Encoder-Decoder Network for Video-based Face Alignment
Xi Peng, Rogerio S. Feris, Xiaoyu Wang, Dimitris N. Metaxas

TL;DR
This paper introduces RED-Net, a real-time video face alignment method using a recurrent encoder-decoder that iteratively refines facial points and disentangles features for improved accuracy and generalization.
Contribution
The paper presents a novel recurrent encoder-decoder architecture with spatial feedback and feature disentangling for enhanced video face alignment.
Findings
Achieves superior accuracy over state-of-the-art methods.
Demonstrates effective iterative coarse-to-fine alignment.
Shows improved generalization through feature disentangling.
Abstract
We propose a novel method for real-time face alignment in videos based on a recurrent encoder-decoder network model. Our proposed model predicts 2D facial point heat maps regularized by both detection and regression loss, while uniquely exploiting recurrent learning at both spatial and temporal dimensions. At the spatial level, we add a feedback loop connection between the combined output response map and the input, in order to enable iterative coarse-to-fine face alignment using a single network model, instead of relying on traditional cascaded model ensembles. At the temporal level, we first decouple the features in the bottleneck of the network into temporal-variant factors, such as pose and expression, and temporal-invariant factors, such as identity information. Temporal recurrent learning is then applied to the decoupled temporal-variant features. We show that such feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Face and Expression Recognition · Face Recognition and Perception
