CLIP-SR: Collaborative Linguistic and Image Processing for   Super-Resolution

Bingwen Hu; Heng Liu; Zhedong Zheng; and Ping Liu

arXiv:2412.11609·cs.CV·April 15, 2025

CLIP-SR: Collaborative Linguistic and Image Processing for Super-Resolution

Bingwen Hu, Heng Liu, Zhedong Zheng, and Ping Liu

PDF

Open Access

TL;DR

CLIP-SR introduces a multi-modal framework combining textual and visual features to improve high-quality, large-scale super-resolution with semantic consistency, outperforming traditional CNN methods especially at 8× and 16× upscaling.

Contribution

The paper presents a novel multi-modal super-resolution method that integrates CLIP-based textual semantics with visual features for enhanced detail and semantic coherence at large upscaling factors.

Findings

01

Effective super-resolution at 16× scaling with high semantic fidelity.

02

Outperforms existing CNN and text-guided SR methods in quality and consistency.

03

Enables controllable SR with semantic preservation.

Abstract

Convolutional Neural Networks (CNNs) have significantly advanced Image Super-Resolution (SR), yet most CNN-based methods rely solely on pixel-based transformations, often leading to artifacts and blurring, particularly under severe downsampling rates (\eg, 8 $\times$ or 16 $\times$ ). The recently developed text-guided SR approaches leverage textual descriptions to enhance their detail restoration capabilities but frequently struggle with effectively performing alignment, resulting in semantic inconsistencies. To address these challenges, we propose a multi-modal semantic enhancement framework that integrates textual semantics with visual features, effectively mitigating semantic mismatches and detail losses in highly degraded low-resolution (LR) images. Our method enables realistic, high-quality SR to be performed at large upscaling factors, with a maximum scaling ratio of 16 $\times$ . The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Medical Imaging Techniques and Applications · Seismic Imaging and Inversion Techniques

MethodsContrastive Language-Image Pre-training