A Backbone Replaceable Fine-tuning Framework for Stable Face Alignment
Xu Sun, Zhenfeng Fan, Zihao Zhang, Yingjie Guo, Shihong Xia

TL;DR
This paper introduces a novel backbone replaceable framework with a Jitter loss function and ConvLSTM to improve stability and accuracy in video face alignment, effectively addressing noise and motion blur issues.
Contribution
It proposes a new video-oriented face alignment framework that enhances stability and accuracy using a Jitter loss and a ConvLSTM structure over a replaceable backbone.
Findings
Achieves at least 40% improvement in stability metrics
Enhances detection accuracy over state-of-the-art methods
Enables swift conversion of image-based detectors to video-optimized models
Abstract
Heatmap regression based face alignment has achieved prominent performance on static images. However, the stability and accuracy are remarkably discounted when applying the existing methods on dynamic videos. We attribute the degradation to random noise and motion blur, which are common in videos. The temporal information is critical to address this issue yet not fully considered in the existing works. In this paper, we visit the video-oriented face alignment problem in two perspectives: detection accuracy prefers lower error for a single frame, and detection consistency forces better stability between adjacent frames. On this basis, we propose a Jitter loss function that leverages temporal information to suppress inaccurate as well as jittered landmarks. The Jitter loss is involved in a novel framework with a fine-tuning ConvLSTM structure over a backbone replaceable network. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis
MethodsConvolution · Tanh Activation · Sigmoid Activation · ConvLSTM
