Exploring Scalable Unified Modeling for General Low-Level Vision

Xiangyu Chen; Kaiwen Zhu; Yuandong Pu; Shuo Cao; Xiaohui Li; Wenlong Zhang; Yihao Liu; Yu Qiao; Jiantao Zhou; and Chao Dong

arXiv:2507.14801·cs.CV·July 22, 2025

Exploring Scalable Unified Modeling for General Low-Level Vision

Xiangyu Chen, Kaiwen Zhu, Yuandong Pu, Shuo Cao, Xiaohui Li, Wenlong Zhang, Yihao Liu, Yu Qiao, Jiantao Zhou, and Chao Dong

PDF

TL;DR

This paper introduces VPIP, a unified framework for low-level vision tasks using visual prompts, demonstrating scalable, adaptable performance across diverse tasks and settings, including zero-shot and few-shot scenarios.

Contribution

The paper proposes a novel prompt-based unified model for low-level vision, extending scalability and generalization across over 100 tasks with strong zero-shot and few-shot capabilities.

Findings

01

Achieves strong performance across diverse low-level vision tasks.

02

Model generalizes well in zero-shot and few-shot settings.

03

Increasing task diversity improves model robustness and transferability.

Abstract

Low-level vision involves a wide spectrum of tasks, including image restoration, enhancement, stylization, and feature extraction, which differ significantly in both task formulation and output domains. To address the challenge of unified modeling across such diverse tasks, we propose a Visual task Prompt-based Image Processing (VPIP) framework that leverages input-target image pairs as visual prompts to guide the model in performing a variety of low-level vision tasks. The framework comprises an end-to-end image processing backbone, a prompt encoder, and a prompt interaction module, enabling flexible integration with various architectures and effective utilization of task-specific visual representations. Based on this design, we develop a unified low-level vision model, GenLV, and evaluate its performance across multiple representative tasks. To explore the scalability of this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.