Understanding Degradation with Vision Language Model

Guanzhou Lan; Chenyi Liao; Yuqi Yang; Qianli Ma; Zhigang Wang; Dong Wang; Bin Zhao; Xuelong Li

arXiv:2602.04565·cs.CV·February 5, 2026

Understanding Degradation with Vision Language Model

Guanzhou Lan, Chenyi Liao, Yuqi Yang, Qianli Ma, Zhigang Wang, Dong Wang, Bin Zhao, Xuelong Li

PDF

Open Access

TL;DR

This paper introduces DU-VLM, a hierarchical vision-language model that understands image degradations through structured prediction, enabling high-fidelity image restoration and outperforming baselines in accuracy and robustness.

Contribution

The work presents a novel autoregressive framework unifying degradation understanding tasks and introduces DU-VLM, a multimodal model trained with supervised and reinforcement learning for degradation analysis.

Findings

01

DU-VLM outperforms baseline models in accuracy and robustness.

02

The model generalizes well to unseen degradation distributions.

03

The large-scale DU-110k dataset supports effective training and evaluation.

Abstract

Understanding visual degradations is a critical yet challenging problem in computer vision. While recent Vision-Language Models (VLMs) excel at qualitative description, they often fall short in understanding the parametric physics underlying image degradations. In this work, we redefine degradation understanding as a hierarchical structured prediction task, necessitating the concurrent estimation of degradation types, parameter keys, and their continuous physical values. Although these sub-tasks operate in disparate spaces, we prove that they can be unified under one autoregressive next-token prediction paradigm, whose error is bounded by the value-space quantization grid. Building on this insight, we introduce DU-VLM, a multimodal chain-of-thought model trained with supervised fine-tuning and reinforcement learning using structured rewards. Furthermore, we show that DU-VLM can serve as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis