Robust Diagram Reasoning: A Framework for Enhancing LVLM Performance on Visually Perturbed Scientific Diagrams
Minghao Zhou, Rafael Souza, Yaqian Hu, Luming Che

TL;DR
This paper introduces the RDR framework to improve and evaluate LVLMs' robustness to visual perturbations in scientific diagrams, addressing a critical gap in current benchmarks and enhancing real-world applicability.
Contribution
The paper presents the RDR framework with AMCV mechanism, new robustness metrics, and a large-scale perturbed scientific diagram dataset, advancing the robustness evaluation and performance of LVLMs.
Findings
LVLMs' performance drops significantly under visual perturbations.
The RDR framework improves robustness metrics compared to baseline models.
New metrics PRS and VDC effectively quantify robustness improvements.
Abstract
Large Language Models (LLMs) and their multimodal variants (LVLMs) hold immense promise for scientific and engineering applications, particularly in processing visual information like scientific diagrams. However, their practical deployment is hindered by a critical lack of robustness to common visual perturbations such as noise, blur, and occlusions, which are prevalent in real-world scientific documents. Existing evaluation benchmarks largely overlook this challenge, leaving the robust reasoning capabilities of LVLMs on visually degraded scientific diagrams underexplored. To address this, we introduce the Robust Diagram Reasoning (RDR) framework, a novel approach designed to enhance and rigorously evaluate LVLMs' performance under such conditions. At its core, RDR employs an Adaptive Multi-View & Consistency Verification (AMCV) mechanism, which involves generating multiple perturbed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
