DicFace: Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration
Yan Chen, Hanlin Shang, Ce Liu, Yuxuan Chen, Hui Li, Weihao Yuan, Hao Zhu, Zilong Dong, Siyu Zhu

TL;DR
DicFace introduces a Dirichlet-constrained variational codebook learning framework that leverages pretrained VQ-VAEs and spatio-temporal Transformers to achieve temporally coherent and high-quality video face restoration, effectively reducing flicker artifacts.
Contribution
The paper proposes a novel Dirichlet-distributed latent space and a spatio-temporal Transformer architecture for improved video face restoration, extending static image priors to video tasks.
Findings
Achieves state-of-the-art results on face restoration benchmarks.
Effectively reduces flicker artifacts in video sequences.
Demonstrates versatility across multiple video restoration tasks.
Abstract
Video face restoration faces a critical challenge in maintaining temporal consistency while recovering fine facial details from degraded inputs. This paper presents a novel approach that extends Vector-Quantized Variational Autoencoders (VQ-VAEs), pretrained on static high-quality portraits, into a video restoration framework through variational latent space modeling. Our key innovation lies in reformulating discrete codebook representations as Dirichlet-distributed continuous variables, enabling probabilistic transitions between facial features across frames. A spatio-temporal Transformer architecture jointly models inter-frame dependencies and predicts latent distributions, while a Laplacian-constrained reconstruction loss combined with perceptual (LPIPS) regularization enhances both pixel accuracy and visual quality. Comprehensive evaluations on blind face restoration, video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Advanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis
