Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception   Ability of LVLMs

Jie Zhang; Zhongqi Wang; Mengqi Lei; Zheng Yuan; Bei Yan; Shiguang; Shan; Xilin Chen

arXiv:2406.18849·cs.CV·February 25, 2025

Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs

Jie Zhang, Zhongqi Wang, Mengqi Lei, Zheng Yuan, Bei Yan, Shiguang, Shan, Xilin Chen

PDF

Open Access 2 Repos

TL;DR

Dysca is a dynamic, scalable benchmark utilizing synthesized images to comprehensively evaluate the perception abilities of LVLMs across various styles, scenarios, and question types, addressing limitations of existing datasets.

Contribution

We introduce Dysca, a novel benchmark that uses generative image synthesis to evaluate LVLMs in diverse, challenging scenarios, enhancing evaluation flexibility and scope.

Findings

01

Current LVLMs show notable perception limitations.

02

Dysca reveals weaknesses in models under noisy and stylized scenarios.

03

Benchmark is scalable and adaptable for future evaluations.

Abstract

Currently many benchmarks have been proposed to evaluate the perception ability of the Large Vision-Language Models (LVLMs). However, most benchmarks conduct questions by selecting images from existing datasets, resulting in the potential data leakage. Besides, these benchmarks merely focus on evaluating LVLMs on the realistic style images and clean scenarios, leaving the multi-stylized images and noisy scenarios unexplored. In response to these challenges, we propose a dynamic and scalable benchmark named Dysca for evaluating LVLMs by leveraging synthesis images. Specifically, we leverage Stable Diffusion and design a rule-based method to dynamically generate novel images, questions and the corresponding answers. We consider 51 kinds of image styles and evaluate the perception capability in 20 subtasks. Moreover, we conduct evaluations under 4 scenarios (i.e., Clean, Corruption, Print…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection

MethodsFocus · Diffusion