LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
Peng Xu, Wenqi Shao, Kaipeng Zhang, Peng Gao, Shuo Liu, Meng Lei,, Fanqing Meng, Siyuan Huang, Yu Qiao, Ping Luo

TL;DR
This paper introduces LVLM-eHub, a comprehensive benchmark for evaluating large vision-language models across multiple capabilities and scenarios, revealing overfitting, hallucination issues, and proposing solutions for better assessment.
Contribution
It presents a new holistic evaluation framework and benchmark for LVLMs, including diverse tests and an online arena, to better understand their capabilities and limitations.
Findings
Instruction-tuned LVLMs overfit tasks and generalize poorly.
Moderate instruction data can cause object hallucination issues.
Multi-turn reasoning evaluation helps mitigate hallucination problems.
Abstract
Large Vision-Language Models (LVLMs) have recently played a dominant role in multimodal vision-language learning. Despite the great success, it lacks a holistic evaluation of their efficacy. This paper presents a comprehensive evaluation of publicly available large multimodal models by building a LVLM evaluation Hub (LVLM-eHub). Our LVLM-eHub consists of representative LVLMs such as InstructBLIP and MiniGPT-4, which are thoroughly evaluated by a quantitative capability evaluation and an online arena platform. The former evaluates categories of multimodal capabilities of LVLMs such as visual question answering and embodied artificial intelligence on standard text-related visual benchmarks, while the latter provides the user-level evaluation of LVLMs in an open-world question-answering scenario. The study reveals several innovative findings. First, instruction-tuned LVLM with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
