Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning
Deepanway Ghosal, Vernon Toh Yan Han, Chia Yew Ken, Soujanya Poria

TL;DR
This paper introduces AlgoPuzzleVQA, a new multimodal dataset for evaluating language models' ability to solve algorithmic puzzles involving visual and complex reasoning, revealing current models' limited performance.
Contribution
The paper presents a novel dataset and task for multimodal puzzle solving, highlighting the challenges in integrating visual, language, and algorithmic reasoning in large language models.
Findings
LLMs like GPT4V and Gemini perform near random on puzzle tasks
The dataset covers diverse mathematical and algorithmic topics
The dataset can be scaled to increase reasoning complexity
Abstract
This paper introduces the novel task of multimodal puzzle solving, framed within the context of visual question-answering. We present a new dataset, AlgoPuzzleVQA designed to challenge and evaluate the capabilities of multimodal language models in solving algorithmic puzzles that necessitate both visual understanding, language understanding, and complex algorithmic reasoning. We create the puzzles to encompass a diverse array of mathematical and algorithmic topics such as boolean logic, combinatorics, graph theory, optimization, search, etc., aiming to evaluate the gap between visual data interpretation and algorithmic problem-solving skills. The dataset is generated automatically from code authored by humans. All our puzzles have exact solutions that can be found from the algorithm without tedious human calculations. It ensures that our dataset can be scaled up arbitrarily in terms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multi-Agent Systems and Negotiation
