MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal   LLMs

Yusu Qian; Hanrong Ye; Jean-Philippe Fauconnier; Peter Grasch; Yinfei; Yang; Zhe Gan

arXiv:2407.01509·cs.CV·March 21, 2025·1 cites

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier, Peter Grasch, Yinfei, Yang, Zhe Gan

PDF

Open Access 1 Repo 1 Models

TL;DR

MIA-Bench is a comprehensive benchmark with 400 challenging image-prompt pairs designed to evaluate and improve the instruction-following capabilities of multimodal large language models, emphasizing adherence to complex instructions.

Contribution

The paper introduces MIA-Bench, a new benchmark for assessing instruction adherence in MLLMs, and explores supervised fine-tuning to enhance compliance without performance loss.

Findings

01

Significant performance variation among state-of-the-art MLLMs.

02

Fine-tuning improves instruction adherence.

03

Benchmark guides future MLLM development.

Abstract

We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions. Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challenge the models' compliance with layered instructions in generating accurate responses that satisfy specific requested patterns. Evaluation results from a wide array of state-of-the-art MLLMs reveal significant variations in performance, highlighting areas for improvement in instruction fidelity. Additionally, we create extra training data and explore supervised fine-tuning to enhance the models' ability to strictly follow instructions without compromising performance on other tasks. We hope this benchmark not only serves as a tool for measuring MLLM adherence to instructions, but also guides future developments in MLLM training methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

apple/ml-mia-bench
noneOfficial

Models

🤗
tuandunghcmut/vlmeval
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Translation Studies and Practices · linguistics and terminology studies

MethodsSparse Evolutionary Training