Understanding Degradation with Vision Language Model
Guanzhou Lan, Chenyi Liao, Yuqi Yang, Qianli Ma, Zhigang Wang, Dong Wang, Bin Zhao, Xuelong Li

TL;DR
This paper introduces DU-VLM, a hierarchical vision-language model that understands image degradations through structured prediction, enabling high-fidelity image restoration and outperforming baselines in accuracy and robustness.
Contribution
The work presents a novel autoregressive framework unifying degradation understanding tasks and introduces DU-VLM, a multimodal model trained with supervised and reinforcement learning for degradation analysis.
Findings
DU-VLM outperforms baseline models in accuracy and robustness.
The model generalizes well to unseen degradation distributions.
The large-scale DU-110k dataset supports effective training and evaluation.
Abstract
Understanding visual degradations is a critical yet challenging problem in computer vision. While recent Vision-Language Models (VLMs) excel at qualitative description, they often fall short in understanding the parametric physics underlying image degradations. In this work, we redefine degradation understanding as a hierarchical structured prediction task, necessitating the concurrent estimation of degradation types, parameter keys, and their continuous physical values. Although these sub-tasks operate in disparate spaces, we prove that they can be unified under one autoregressive next-token prediction paradigm, whose error is bounded by the value-space quantization grid. Building on this insight, we introduce DU-VLM, a multimodal chain-of-thought model trained with supervised fine-tuning and reinforcement learning using structured rewards. Furthermore, we show that DU-VLM can serve as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
