Unifying Image Processing as Visual Prompting Question Answering
Yihao Liu, Xiangyu Chen, Xianzheng Ma, Xintao Wang, Jiantao Zhou, Yu, Qiao, Chao Dong

TL;DR
PromptGIP introduces a universal visual prompting question answering framework that unifies various image processing tasks, reducing the need for task-specific models and enabling cross-domain generalization.
Contribution
It proposes a novel visual prompting QA paradigm for general image processing, covering low-level and high-level tasks within a single adaptable framework.
Findings
Demonstrates capability to handle diverse image processing tasks without fine-tuning
Shows potential for out-of-domain task generalization
Provides a unified approach inspired by NLP question answering techniques
Abstract
Image processing is a fundamental task in computer vision, which aims at enhancing image quality and extracting essential features for subsequent vision applications. Traditionally, task-specific models are developed for individual tasks and designing such models requires distinct expertise. Building upon the success of large language models (LLMs) in natural language processing (NLP), there is a similar trend in computer vision, which focuses on developing large-scale models through pretraining and in-context learning. This paradigm shift reduces the reliance on task-specific models, yielding a powerful unified model to deal with various tasks. However, these advances have predominantly concentrated on high-level vision tasks, with less attention paid to low-level vision tasks. To address this issue, we propose a universal model for general image processing that covers image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
