Instruct-IPT: All-in-One Image Processing Transformer via Weight Modulation
Yuchuan Tian, Jianhong Han, Hanting Chen, Yuanyuan Xi, Ning Ding, Jie, Hu, Chao Xu, Yunhe Wang

TL;DR
Instruct-IPT is a versatile image processing transformer that uses weight modulation and task-specific biases to effectively handle diverse low-level vision tasks, including denoising, deblurring, and more, with improved performance and flexibility.
Contribution
The paper introduces a novel weight modulation approach with task-specific biases and low-rank decomposition, enabling a single model to excel across multiple distinct image restoration tasks.
Findings
Effective multi-task performance on various image restoration tasks.
Enhanced cooperation between tasks with distinct characteristics.
Extension of the method to diffusion denoisers.
Abstract
Due to the unaffordable size and intensive computation costs of low-level vision models, All-in-One models that are designed to address a handful of low-level vision tasks simultaneously have been popular. However, existing All-in-One models are limited in terms of the range of tasks and performance. To overcome these limitations, we propose Instruct-IPT -- an All-in-One Image Processing Transformer (IPT) that could effectively address manifold image restoration tasks with large inter-task gaps, such as denoising, deblurring, deraining, dehazing, and desnowing. While most research propose feature adaptation methods, we reveal their failure in addressing highly distinct tasks, and suggest weight modulation that adapts weights to specific tasks. Firstly, we search for task-sensitive weights and introduce task-specific biases on top of them. Secondly, we conduct rank analysis for a good…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Diffusion · Position-Wise Feed-Forward Layer · Adam
