Benchmarking Sequential Visual Input Reasoning and Prediction in   Multimodal Large Language Models

Mingwei Zhu; Leigang Sha; Yu Shu; Kangjia Zhao; Tiancheng Zhao,; Jianwei Yin

arXiv:2310.13473·cs.CV·October 23, 2023·1 cites

Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models

Mingwei Zhu, Leigang Sha, Yu Shu, Kangjia Zhao, Tiancheng Zhao,, Jianwei Yin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new benchmark and evaluation methods for assessing the predictive reasoning capabilities of multimodal large language models across various scenarios, highlighting their strengths and weaknesses.

Contribution

It presents a novel benchmark and evaluation framework specifically designed to measure predictive reasoning in multimodal large language models, filling a significant research gap.

Findings

01

Current MLLMs show varied performance in predictive reasoning tasks.

02

The benchmark effectively differentiates model capabilities in diverse scenarios.

03

Evaluation methods provide robust quantification of future prediction and reasoning abilities.

Abstract

Multimodal large language models (MLLMs) have shown great potential in perception and interpretation tasks, but their capabilities in predictive reasoning remain under-explored. To address this gap, we introduce a novel benchmark that assesses the predictive reasoning capabilities of MLLMs across diverse scenarios. Our benchmark targets three important domains: abstract pattern reasoning, human activity prediction, and physical interaction prediction. We further develop three evaluation methods powered by large language model to robustly quantify a model's performance in predicting and reasoning the future based on multi-visual context. Empirical experiments confirm the soundness of the proposed benchmark and evaluation methods via rigorous testing and reveal pros and cons of current popular MLLMs in the task of predictive reasoning. Lastly, our proposed benchmark provides a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

coderj-one/giraffe-bench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling