CLIP-SR: Collaborative Linguistic and Image Processing for Super-Resolution
Bingwen Hu, Heng Liu, Zhedong Zheng, and Ping Liu

TL;DR
CLIP-SR introduces a multi-modal framework combining textual and visual features to improve high-quality, large-scale super-resolution with semantic consistency, outperforming traditional CNN methods especially at 8× and 16× upscaling.
Contribution
The paper presents a novel multi-modal super-resolution method that integrates CLIP-based textual semantics with visual features for enhanced detail and semantic coherence at large upscaling factors.
Findings
Effective super-resolution at 16× scaling with high semantic fidelity.
Outperforms existing CNN and text-guided SR methods in quality and consistency.
Enables controllable SR with semantic preservation.
Abstract
Convolutional Neural Networks (CNNs) have significantly advanced Image Super-Resolution (SR), yet most CNN-based methods rely solely on pixel-based transformations, often leading to artifacts and blurring, particularly under severe downsampling rates (\eg, 8 or 16). The recently developed text-guided SR approaches leverage textual descriptions to enhance their detail restoration capabilities but frequently struggle with effectively performing alignment, resulting in semantic inconsistencies. To address these challenges, we propose a multi-modal semantic enhancement framework that integrates textual semantics with visual features, effectively mitigating semantic mismatches and detail losses in highly degraded low-resolution (LR) images. Our method enables realistic, high-quality SR to be performed at large upscaling factors, with a maximum scaling ratio of 16. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Medical Imaging Techniques and Applications · Seismic Imaging and Inversion Techniques
MethodsContrastive Language-Image Pre-training
