ReForm-Eval: Evaluating Large Vision Language Models via Unified   Re-Formulation of Task-Oriented Benchmarks

Zejun Li; Ye Wang; Mengfei Du; Qingwen Liu; Binhao Wu; Jiwen Zhang,; Chengxing Zhou; Zhihao Fan; Jie Fu; Jingjing Chen; Xuanjing Huang; Zhongyu; Wei

arXiv:2310.02569·cs.CV·October 18, 2023·1 cites

ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks

Zejun Li, Ye Wang, Mengfei Du, Qingwen Liu, Binhao Wu, Jiwen Zhang,, Chengxing Zhou, Zhihao Fan, Jie Fu, Jingjing Chen, Xuanjing Huang, Zhongyu, Wei

PDF

Open Access 1 Repo

TL;DR

ReForm-Eval introduces a unified benchmark for evaluating large vision-language models by reformulating existing benchmarks into LVLM-compatible formats, enabling comprehensive and automated assessment of their capabilities.

Contribution

The paper proposes a systematic method to reformulate existing benchmarks into a unified format for LVLM evaluation, reducing manual effort and enabling comprehensive assessment.

Findings

01

Extensive experiments reveal strengths and weaknesses of current LVLMs.

02

ReForm-Eval provides a large, unified dataset for LVLM evaluation.

03

Analysis uncovers key factors influencing LVLM performance.

Abstract

Recent years have witnessed remarkable progress in the development of large vision-language models (LVLMs). Benefiting from the strong language backbones and efficient cross-modal alignment strategies, LVLMs exhibit surprising capabilities to perceive visual signals and perform visually grounded reasoning. However, the capabilities of LVLMs have not been comprehensively and quantitatively evaluate. Most existing multi-modal benchmarks require task-oriented input-output formats, posing great challenges to automatically assess the free-form text output of LVLMs. To effectively leverage the annotations available in existing benchmarks and reduce the manual effort required for constructing new benchmarks, we propose to re-formulate existing benchmarks into unified LVLM-compatible formats. Through systematic data collection and reformulation, we present the ReForm-Eval benchmark, offering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fudandisc/reform-eval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Genomics and Phylogenetic Studies