Learning Domain-Aware Task Prompt Representations for Multi-Domain All-in-One Image Restoration
Guanglu Dong, Chunlei Li, Chao Ren, Jingliang Hu, Yilei Shi, Xiao Xiang Zhu, Lichao Mou

TL;DR
This paper introduces a novel multi-domain all-in-one image restoration method that adaptively combines task and domain prompts, significantly improving performance and generalization across various image restoration tasks and domains.
Contribution
The paper proposes the first multi-domain all-in-one image restoration framework using domain-aware task prompt representation learning, with adaptive prompt selection and domain priors distillation.
Findings
Outperforms state-of-the-art methods on multiple image restoration tasks.
Demonstrates strong generalization across diverse domains.
Effectively leverages multimodal large language models for domain priors.
Abstract
Recently, significant breakthroughs have been made in all-in-one image restoration (AiOIR), which can handle multiple restoration tasks with a single model. However, existing methods typically focus on a specific image domain, such as natural scene, medical imaging, or remote sensing. In this work, we aim to extend AiOIR to multiple domains and propose the first multi-domain all-in-one image restoration method, DATPRL-IR, based on our proposed Domain-Aware Task Prompt Representation Learning. Specifically, we first construct a task prompt pool containing multiple task prompts, in which task-related knowledge is implicitly encoded. For each input image, the model adaptively selects the most relevant task prompts and composes them into an instance-level task representation via a prompt composition mechanism (PCM). Furthermore, to endow the model with domain awareness, we introduce another…
Peer Reviews
Decision·ICLR 2026 Poster
1. The problem scope is novel and interesting, from my point of view, this is the first attempt to unify AiOIR across multiple task domains. 2. The dual prompt pools and cross-modal alignment are practically effective, and supported by comprehensive experiments with strong empirical performance and ablation studies. 3. Figures and paper writings are easy to follow.
1. Although the evaluation scope contains multiple tasks from three domains, it is still unclear how the method will perform under zero-shot unseen domains. I think this is important to provide related experimental results to evaluate whether the proposed method is truely practical across domains. 2. LLaVa seems generate captions that may exceed the 77 tokens limit of CLIP text encoder. Which part of the text is useful for cross modal alignment is worth exploration. 3. The authors only provide
1. The implementation is comprehensive, involving a structured dual-prompt architecture with cross-modal alignment using LLMs (LLaVA/CLIP), regularization mechanisms, and adaptive fusion. 2. Experiments are conducted on a relatively large number of tasks and datasets, covering natural, remote sensing, and medical images. 3. Ablation studies and visualizations are detailed and well-presented, providing insights into the behavior of the model.
1. The central claim that "Multi-domain All-in-One Image Restoration" is a novel and meaningful research problem is not sufficiently justified. The distinction between MD-AiOIR and regular AiOIR is weak. The proposed setting is essentially a standard multi-task image restoration framework with more diverse datasets included. The fact that earlier works did not include medical or remote sensing datasets does not inherently make this a new problem. The paper does not provide a convincing motivatio
The article is well-structured and easy to understand. By separating image types and restoration tasks, the author introduces an interesting approach to achieve different types of image restoration. This method avoids the confusion between types and tasks, allowing the model to better perform specific tasks in specific domains.
1. Motivation: The authors propose a unified image restoration model capable of handling various types of images. This raises several questions: Why is this approach necessary, and why were previous methods unable to achieve this? The authors briefly mention in the introduction that earlier methods did not accomplish such unification, but the author fails to analyze why those methods were incapable of doing so. 2. How to hand multi-modal data inputs: The paper addresses multiple domains but doe
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
