Seeing the Forest and the Trees: Solving Visual Graph and Tree Based Data Structure Problems using Large Multimodal Models
Sebastian Gutierrez, Irene Hou, Jihye Lee, Kenneth Angelikas, Owen, Man, Sophia Mettille, James Prather, Paul Denny, Stephen MacNeil

TL;DR
This paper evaluates large multimodal models' ability to solve graph and tree data structure problems from images, introducing a new benchmark dataset and analyzing model performance and limitations.
Contribution
It presents a novel benchmark dataset for visual graph and tree problems and assesses the performance of various large multimodal models on this dataset.
Findings
GPT-4o achieved 87.6% accuracy on tree problems.
Gemini 1.5 Flash achieved 56.2% accuracy on graph problems.
Model performance is significantly affected by structural and visual variations.
Abstract
Recent advancements in generative AI systems have raised concerns about academic integrity among educators. Beyond excelling at solving programming problems and text-based multiple-choice questions, recent research has also found that large multimodal models (LMMs) can solve Parsons problems based only on an image. However, such problems are still inherently text-based and rely on the capabilities of the models to convert the images of code blocks to their corresponding text. In this paper, we further investigate the capabilities of LMMs to solve graph and tree data structure problems based only on images. To achieve this, we computationally construct and evaluate a novel benchmark dataset comprising 9,072 samples of diverse graph and tree data structure tasks to assess the performance of the GPT-4o, GPT-4v, Gemini 1.5 Pro, Gemini 1.5 Flash, Gemini 1.0 Pro Vision, and Claude 3 model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Data Management and Algorithms · Data Mining Algorithms and Applications
