R-Bench: Are your Large Multimodal Model Robust to Real-world   Corruptions?

Chunyi Li; Jianbo Zhang; Zicheng Zhang; Haoning Wu; Yuan Tian; Wei; Sun; Guo Lu; Xiaohong Liu; Xiongkuo Min; Weisi Lin; Guangtao Zhai

arXiv:2410.05474·cs.CV·October 10, 2024

R-Bench: Are your Large Multimodal Model Robust to Real-world Corruptions?

Chunyi Li, Jianbo Zhang, Zicheng Zhang, Haoning Wu, Yuan Tian, Wei, Sun, Guo Lu, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai

PDF

Open Access 1 Repo 1 Models

TL;DR

R-Bench is a comprehensive benchmark designed to evaluate the robustness of large multimodal models against real-world image corruptions, highlighting their performance gaps compared to human perception.

Contribution

The paper introduces R-Bench, modeling 33 corruption dimensions, collecting a new dataset, and benchmarking 20 LMMs to assess their real-world robustness.

Findings

01

LMMs perform well on original images but poorly on distorted ones

02

Significant robustness gap exists between LMMs and human perception

03

R-Bench provides a new standard for evaluating real-world robustness

Abstract

The outstanding performance of Large Multimodal Models (LMMs) has made them widely applied in vision-related tasks. However, various corruptions in the real world mean that images will not be as ideal as in simulations, presenting significant challenges for the practical application of LMMs. To address this issue, we introduce R-Bench, a benchmark focused on the **Real-world Robustness of LMMs**. Specifically, we: (a) model the complete link from user capture to LMMs reception, comprising 33 corruption dimensions, including 7 steps according to the corruption sequence, and 7 groups based on low-level attributes; (b) collect reference/distorted image dataset before/after corruption, including 2,970 question-answer pairs with human labeling; (c) propose comprehensive evaluation for absolute/relative robustness and benchmark 20 mainstream LMMs. Results show that while LMMs can correctly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

q-future/r-bench
noneOfficial

Models

🤗
tuandunghcmut/vlmeval
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques