Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding
Jiaqi Tang, Jianmin Chen, Wei Wei, Xiaogang Xu, Runtao Liu, Xiangyu Wu, Qipeng Xie, Jiafei Wu, Lei Zhang, Qifeng Chen

TL;DR
Robust-R1 introduces a structured reasoning framework for multimodal models that explicitly models visual degradations, significantly improving robustness against real-world visual impairments through specialized training and adaptive reasoning strategies.
Contribution
The paper presents Robust-R1, a novel degradation-aware reasoning framework that explicitly models visual degradations, with a new dataset and techniques for enhanced robustness in visual understanding.
Findings
Outperforms all baselines on R-Bench degradation benchmark
Maintains superior performance under multi-intensity adversarial degradations
Achieves state-of-the-art robustness in real-world visual scenarios
Abstract
Multimodal Large Language Models struggle to maintain reliable performance under extreme real-world visual degradations, which impede their practical robustness. Existing robust MLLMs predominantly rely on implicit training/adaptation that focuses solely on visual encoder generalization, suffering from limited interpretability and isolated optimization. To overcome these limitations, we propose Robust-R1, a novel framework that explicitly models visual degradations through structured reasoning chains. Our approach integrates: (i) supervised fine-tuning for degradation-aware reasoning foundations, (ii) reward-driven alignment for accurately perceiving degradation parameters, and (iii) dynamic reasoning depth scaling adapted to degradation intensity. To facilitate this approach, we introduce a specialized 11K dataset featuring realistic degradations synthesized across four critical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
