CLEAR: Unlocking Generative Potential for Degraded Image Understanding in Unified Multimodal Models
Xiangzhao Hao, Zefeng Zhang, Zhenyu Zhang, Linhao Yu, Yao Chen, Yiqian Zhang, Haiyun Guo, Shuohuan Wang, and Yu Sun

TL;DR
CLEAR enhances multimodal models' ability to understand degraded images by integrating generation and reasoning through progressive training, resulting in improved robustness and visual quality.
Contribution
The paper introduces a novel framework, CLEAR, that connects generation and reasoning in multimodal models via fine-tuning, a latent bridge, and reinforcement learning, improving degraded image understanding.
Findings
CLEAR significantly boosts robustness on degraded images.
Removing pixel-level supervision improves perceptual quality of visual states.
The approach maintains performance on clean images while enhancing degraded input understanding.
Abstract
Image degradation from blur, noise, compression, and poor illumination severely undermines multimodal understanding in real-world settings. Unified multimodal models that combine understanding and generation within a single architecture are a natural fit for this challenge, as their generative pathway can model the fine-grained visual structure that degradation destroys. Yet these models fail to leverage their own generative capacity on degraded inputs. We trace this disconnect to two compounding factors: existing training regimes never ask the model to invoke generation during reasoning, and the standard decode-reencode pathway does not support effective joint optimization. We present CLEAR, a framework that connects the two capabilities through three progressive steps: (1) supervised fine-tuning on a degradation-aware dataset to establish the generate-then-answer reasoning pattern;…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
