Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and   Fidelity for All-in-One Image Restoration

Yuang Ai; Huaibo Huang; Xiaoqiang Zhou; Jiexiang Wang; Ran He

arXiv:2312.02918·cs.CV·March 21, 2024·2 cites

Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration

Yuang Ai, Huaibo Huang, Xiaoqiang Zhou, Jiexiang Wang, Ran He

PDF

Open Access

TL;DR

MPerceiver introduces a multimodal prompt learning framework leveraging Stable Diffusion priors and adaptive prompts to significantly improve all-in-one image restoration's flexibility, accuracy, and ability to generalize across diverse real-world degradations.

Contribution

It proposes a dual-branch prompt module with adaptive responses and a detail refinement component, enabling superior performance and zero-shot generalization in all-in-one image restoration.

Findings

01

Outperforms state-of-the-art task-specific methods on 9 IR tasks

02

Achieves remarkable zero-shot and few-shot capabilities on unseen tasks

03

Demonstrates superior adaptiveness, generalizability, and fidelity across 16 IR tasks

Abstract

Despite substantial progress, all-in-one image restoration (IR) grapples with persistent challenges in handling intricate real-world degradations. This paper introduces MPerceiver: a novel multimodal prompt learning approach that harnesses Stable Diffusion (SD) priors to enhance adaptiveness, generalizability and fidelity for all-in-one image restoration. Specifically, we develop a dual-branch module to master two types of SD prompts: textual for holistic representation and visual for multiscale detail representation. Both prompts are dynamically adjusted by degradation predictions from the CLIP image encoder, enabling adaptive responses to diverse unknown degradations. Moreover, a plug-in detail refinement module improves restoration fidelity via direct encoder-to-decoder information transformation. To assess our method, MPerceiver is trained on 9 tasks for all-in-one IR and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Image Enhancement Techniques · Advanced Optical Sensing Technologies

MethodsContrastive Language-Image Pre-training · Diffusion