CLEAR: Unlocking Generative Potential for Degraded Image Understanding in Unified Multimodal Models

Xiangzhao Hao; Zefeng Zhang; Zhenyu Zhang; Linhao Yu; Yao Chen; Yiqian Zhang; Haiyun Guo; Shuohuan Wang; and Yu Sun

arXiv:2604.04780·cs.CV·April 7, 2026

CLEAR: Unlocking Generative Potential for Degraded Image Understanding in Unified Multimodal Models

Xiangzhao Hao, Zefeng Zhang, Zhenyu Zhang, Linhao Yu, Yao Chen, Yiqian Zhang, Haiyun Guo, Shuohuan Wang, and Yu Sun

PDF

1 Repo 1 Models 1 Datasets

TL;DR

CLEAR enhances multimodal models' ability to understand degraded images by integrating generation and reasoning through progressive training, resulting in improved robustness and visual quality.

Contribution

The paper introduces a novel framework, CLEAR, that connects generation and reasoning in multimodal models via fine-tuning, a latent bridge, and reinforcement learning, improving degraded image understanding.

Findings

01

CLEAR significantly boosts robustness on degraded images.

02

Removing pixel-level supervision improves perceptual quality of visual states.

03

The approach maintains performance on clean images while enhancing degraded input understanding.

Abstract

Image degradation from blur, noise, compression, and poor illumination severely undermines multimodal understanding in real-world settings. Unified multimodal models that combine understanding and generation within a single architecture are a natural fit for this challenge, as their generative pathway can model the fine-grained visual structure that degradation destroys. Yet these models fail to leverage their own generative capacity on degraded inputs. We trace this disconnect to two compounding factors: existing training regimes never ask the model to invoke generation during reasoning, and the standard decode-reencode pathway does not support effective joint optimization. We present CLEAR, a framework that connects the two capabilities through three progressive steps: (1) supervised fine-tuning on a degradation-aware dataset to establish the generate-then-answer reasoning pattern;…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haoxiangzhao12138/CLEAR
github

Models

🤗
CUDAOUTOFMEMORY/CLEAR
model· 8 dl
8 dl

Datasets

CUDAOUTOFMEMORY/MMD-Bench
dataset· 290 dl
290 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.