MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier, Peter Grasch, Yinfei, Yang, Zhe Gan

TL;DR
MIA-Bench is a comprehensive benchmark with 400 challenging image-prompt pairs designed to evaluate and improve the instruction-following capabilities of multimodal large language models, emphasizing adherence to complex instructions.
Contribution
The paper introduces MIA-Bench, a new benchmark for assessing instruction adherence in MLLMs, and explores supervised fine-tuning to enhance compliance without performance loss.
Findings
Significant performance variation among state-of-the-art MLLMs.
Fine-tuning improves instruction adherence.
Benchmark guides future MLLM development.
Abstract
We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions. Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challenge the models' compliance with layered instructions in generating accurate responses that satisfy specific requested patterns. Evaluation results from a wide array of state-of-the-art MLLMs reveal significant variations in performance, highlighting areas for improvement in instruction fidelity. Additionally, we create extra training data and explore supervised fine-tuning to enhance the models' ability to strictly follow instructions without compromising performance on other tasks. We hope this benchmark not only serves as a tool for measuring MLLM adherence to instructions, but also guides future developments in MLLM training methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Translation Studies and Practices · linguistics and terminology studies
MethodsSparse Evolutionary Training
