ReMI: A Dataset for Reasoning with Multiple Images
Mehran Kazemi, Nishanth Dikkala, Ankit Anand, Petar Devic, Ishita, Dasgupta, Fangyu Liu, Bahare Fatemi, Pranjal Awasthi, Dee Guo, Sreenivas, Gollapudi, Ahmed Qureshi

TL;DR
ReMI is a new dataset designed to evaluate large language models' ability to perform reasoning tasks involving multiple images across various domains, revealing current limitations and guiding future improvements.
Contribution
This paper introduces ReMI, the first comprehensive dataset for multi-image reasoning in LLMs, covering diverse tasks and characteristics, and benchmarks several models to identify performance gaps.
Findings
Significant performance gap between LLMs and humans in multi-image reasoning.
Different models show varying strengths and weaknesses across reasoning tasks.
ReMI dataset is publicly available for further research.
Abstract
With the continuous advancement of large language models (LLMs), it is essential to create new benchmarks to effectively evaluate their expanding capabilities and identify areas for improvement. This work focuses on multi-image reasoning, an emerging capability in state-of-the-art LLMs. We introduce ReMI, a dataset designed to assess LLMs' ability to Reason with Multiple Images. This dataset encompasses a diverse range of tasks, spanning various reasoning domains such as math, physics, logic, code, table/chart understanding, and spatial and temporal reasoning. It also covers a broad spectrum of characteristics found in multi-image reasoning scenarios. We have benchmarked several cutting-edge LLMs using ReMI and found a substantial gap between their performance and human-level proficiency. This highlights the challenges in multi-image reasoning and the need for further research. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
