Towards Flexible Evaluation for Generative Visual Question Answering

Huishan Ji; Qingyi Si; Zheng Lin; Weiping Wang

arXiv:2408.00300·cs.CV·August 2, 2024

Towards Flexible Evaluation for Generative Visual Question Answering

Huishan Ji, Qingyi Si, Zheng Lin, Weiping Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a semantics-based evaluation framework for VQA, proposing a flexible evaluator that better captures open-ended responses and outperforms existing methods, enhancing the assessment of multimodal models.

Contribution

It develops a novel Semantically Flexible VQA Evaluator (SFVE) and a dataset for analyzing VQA evaluators, addressing limitations of exact match metrics.

Findings

01

SFVE surpasses existing semantic evaluators significantly.

02

Model-based evaluation is feasible and effective.

03

The training scheme generalizes across different encoder architectures.

Abstract

Throughout rapid development of multimodal large language models, a crucial ingredient is a fair and accurate evaluation of their multimodal comprehension abilities. Although Visual Question Answering (VQA) could serve as a developed test field, limitations of VQA evaluation, like the inflexible pattern of Exact Match, have hindered MLLMs from demonstrating their real capability and discourage rich responses. Therefore, this paper proposes the use of semantics-based evaluators for assessing unconstrained open-ended responses on VQA datasets. As characteristics of VQA have made such evaluation significantly different than the traditional Semantic Textual Similarity (STS) task, to systematically analyze the behaviour and compare the performance of various evaluators including LLM-based ones, we proposes three key properties, i.e., Alignment, Consistency and Generalization, and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jihuishan/flexible_evaluation_for_vqa_mm24
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Speech and dialogue systems