LLV-FSR: Exploiting Large Language-Vision Prior for Face Super-resolution
Chenyang Wang, Wenjie An, Kui Jiang, Xianming Liu, Junjun Jiang

TL;DR
This paper introduces LLV-FSR, a face super-resolution framework that leverages large vision-language models and pluralistic priors like captions and depth maps to enhance reconstruction quality and perceptual realism.
Contribution
The novel integration of vision-language priors into face super-resolution to utilize higher-order semantic and non-visual information for improved results.
Findings
Surpasses SOTA by 0.43dB PSNR on MMCelebA-HQ dataset
Significantly improves both reconstruction and perceptual quality
Effectively incorporates pluralistic priors like captions and depth maps
Abstract
Existing face super-resolution (FSR) methods have made significant advancements, but they primarily super-resolve face with limited visual information, original pixel-wise space in particular, commonly overlooking the pluralistic clues, like the higher-order depth and semantics, as well as non-visual inputs (text caption and description). Consequently, these methods struggle to produce a unified and meaningful representation from the input face. We suppose that introducing the language-vision pluralistic representation into unexplored potential embedding space could enhance FSR by encoding and exploiting the complementarity across language-vision prior. This motivates us to propose a new framework called LLV-FSR, which marries the power of large vision-language model and higher-order visual prior with the challenging task of FSR. Specifically, besides directly absorbing knowledge from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Face recognition and analysis
