VFHQ: A High-Quality Dataset and Benchmark for Video Face Super-Resolution
Liangbin Xie. Xintao Wang, Honglun Zhang, Chao Dong, Ying Shan

TL;DR
This paper introduces VFHQ, a high-quality video face dataset, and demonstrates that models trained on it produce sharper, more consistent super-resolved videos than those trained on lower-quality datasets, advancing VFSR performance.
Contribution
The paper presents a new high-quality dataset for video face super-resolution and provides a benchmarking study of state-of-the-art algorithms using this dataset.
Findings
Models trained on VFHQ produce sharper edges and finer textures.
Temporal information significantly improves video consistency and visual quality.
Benchmarking reveals the superiority of models trained on VFHQ over existing datasets.
Abstract
Most of the existing video face super-resolution (VFSR) methods are trained and evaluated on VoxCeleb1, which is designed specifically for speaker identification and the frames in this dataset are of low quality. As a consequence, the VFSR models trained on this dataset can not output visual-pleasing results. In this paper, we develop an automatic and scalable pipeline to collect a high-quality video face dataset (VFHQ), which contains over high-fidelity clips of diverse interview scenarios. To verify the necessity of VFHQ, we further conduct experiments and demonstrate that VFSR models trained on our VFHQ dataset can generate results with sharper edges and finer textures than those trained on VoxCeleb1. In addition, we show that the temporal information plays a pivotal role in eliminating video consistency issues as well as further improving visual performance. Based on VFHQ,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Speech and Audio Processing · Face recognition and analysis
