Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads
Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Joanna Matthiesen, Kevin, Smith, Joshua B. Tenenbaum

TL;DR
This paper evaluates large vision-and-language models on children's Olympiad math problems, revealing their reasoning strengths and limitations compared to human children, and analyzing differences in reasoning types.
Contribution
It introduces SMART-840, a dataset of Olympiad problems for evaluating AI mathematical reasoning, and provides a systematic analysis of LVLMs' capabilities across age-appropriate problems.
Findings
LVLMs improve with higher-grade problems
LVLMs struggle with problems for younger children
AI reasoning differs from children's cognitive processes
Abstract
Recent years have seen a significant progress in the general-purpose problem solving abilities of large vision and language models (LVLMs), such as ChatGPT, Gemini, etc.; some of these breakthroughs even seem to enable AI models to outperform human abilities in varied tasks that demand higher-order cognitive skills. Are the current large AI models indeed capable of generalized problem solving as humans do? A systematic analysis of AI capabilities for joint vision and text reasoning, however, is missing in the current scientific literature. In this paper, we make an effort towards filling this gap, by evaluating state-of-the-art LVLMs on their mathematical and algorithmic reasoning abilities using visuo-linguistic problems from children's Olympiads. Specifically, we consider problems from the Mathematical Kangaroo (MK) Olympiad, which is a popular international competition targeted at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEducational Assessment and Pedagogy · Mathematics Education and Teaching Techniques
