You Only Landmark Once: Lightweight U-Net Face Super Resolution with YOLO-World Landmark Heatmaps

Riccardo Carraro; Anna Briotto; Endi Hysa; Marco Fiorucci; Lamberto Ballan

arXiv:2605.14166·cs.CV·May 15, 2026

You Only Landmark Once: Lightweight U-Net Face Super Resolution with YOLO-World Landmark Heatmaps

Riccardo Carraro, Anna Briotto, Endi Hysa, Marco Fiorucci, Lamberto Ballan

PDF

TL;DR

This paper introduces a lightweight face super-resolution method using YOLO-World heatmaps for feature localization, achieving high-quality results without complex architectures or adversarial training.

Contribution

A novel supervision strategy leveraging YOLO-World heatmaps for face super-resolution, eliminating the need for dedicated landmark networks and reducing computational complexity.

Findings

01

Achieves 8x magnification from 16x16 to 128x128 face images.

02

Improves reconstruction quality using heatmap-guided loss.

03

Produces sharper, more realistic face images without adversarial training.

Abstract

Face image super-resolution aims to recover high-resolution facial images from severely degraded inputs. Under extreme upscaling factors, fine facial details are often lost, making accurate reconstruction challenging. Existing methods typically rely on heavy network architectures, adversarial training schemes, or separate alignment networks, increasing model complexity and computational cost. To address these issues, we propose a lightweight U-Net based-architecture designed to reconstructs $128 \times 128$ facial images from severely degraded $16 \times 16$ inputs, achieving an $8 \times$ magnification. A key contribution is a novel auxiliary-training-free supervision strategy that leverages heatmaps generated by YOLO-World, an open-vocabulary object detector, to localize key facial features such as eyes, nose, and mouth. These heatmaps are converted into spatial weights to form a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.