Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs
Jie Zhang, Zhongqi Wang, Mengqi Lei, Zheng Yuan, Bei Yan, Shiguang, Shan, Xilin Chen

TL;DR
Dysca is a dynamic, scalable benchmark utilizing synthesized images to comprehensively evaluate the perception abilities of LVLMs across various styles, scenarios, and question types, addressing limitations of existing datasets.
Contribution
We introduce Dysca, a novel benchmark that uses generative image synthesis to evaluate LVLMs in diverse, challenging scenarios, enhancing evaluation flexibility and scope.
Findings
Current LVLMs show notable perception limitations.
Dysca reveals weaknesses in models under noisy and stylized scenarios.
Benchmark is scalable and adaptable for future evaluations.
Abstract
Currently many benchmarks have been proposed to evaluate the perception ability of the Large Vision-Language Models (LVLMs). However, most benchmarks conduct questions by selecting images from existing datasets, resulting in the potential data leakage. Besides, these benchmarks merely focus on evaluating LVLMs on the realistic style images and clean scenarios, leaving the multi-stylized images and noisy scenarios unexplored. In response to these challenges, we propose a dynamic and scalable benchmark named Dysca for evaluating LVLMs by leveraging synthesis images. Specifically, we leverage Stable Diffusion and design a rule-based method to dynamically generate novel images, questions and the corresponding answers. We consider 51 kinds of image styles and evaluate the perception capability in 20 subtasks. Moreover, we conduct evaluations under 4 scenarios (i.e., Clean, Corruption, Print…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection
MethodsFocus · Diffusion
