You Only Landmark Once: Lightweight U-Net Face Super Resolution with YOLO-World Landmark Heatmaps
Riccardo Carraro, Anna Briotto, Endi Hysa, Marco Fiorucci, Lamberto Ballan

TL;DR
This paper introduces a lightweight face super-resolution method using YOLO-World heatmaps for feature localization, achieving high-quality results without complex architectures or adversarial training.
Contribution
A novel supervision strategy leveraging YOLO-World heatmaps for face super-resolution, eliminating the need for dedicated landmark networks and reducing computational complexity.
Findings
Achieves 8x magnification from 16x16 to 128x128 face images.
Improves reconstruction quality using heatmap-guided loss.
Produces sharper, more realistic face images without adversarial training.
Abstract
Face image super-resolution aims to recover high-resolution facial images from severely degraded inputs. Under extreme upscaling factors, fine facial details are often lost, making accurate reconstruction challenging. Existing methods typically rely on heavy network architectures, adversarial training schemes, or separate alignment networks, increasing model complexity and computational cost. To address these issues, we propose a lightweight U-Net based-architecture designed to reconstructs facial images from severely degraded inputs, achieving an magnification. A key contribution is a novel auxiliary-training-free supervision strategy that leverages heatmaps generated by YOLO-World, an open-vocabulary object detector, to localize key facial features such as eyes, nose, and mouth. These heatmaps are converted into spatial weights to form a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
