Evaluating Large Vision-and-Language Models on Children's Mathematical   Olympiads

Anoop Cherian; Kuan-Chuan Peng; Suhas Lohit; Joanna Matthiesen; Kevin; Smith; Joshua B. Tenenbaum

arXiv:2406.15736·cs.LG·December 9, 2024

Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads

Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Joanna Matthiesen, Kevin, Smith, Joshua B. Tenenbaum

PDF

Open Access 1 Video

TL;DR

This paper evaluates large vision-and-language models on children's Olympiad math problems, revealing their reasoning strengths and limitations compared to human children, and analyzing differences in reasoning types.

Contribution

It introduces SMART-840, a dataset of Olympiad problems for evaluating AI mathematical reasoning, and provides a systematic analysis of LVLMs' capabilities across age-appropriate problems.

Findings

01

LVLMs improve with higher-grade problems

02

LVLMs struggle with problems for younger children

03

AI reasoning differs from children's cognitive processes

Abstract

Recent years have seen a significant progress in the general-purpose problem solving abilities of large vision and language models (LVLMs), such as ChatGPT, Gemini, etc.; some of these breakthroughs even seem to enable AI models to outperform human abilities in varied tasks that demand higher-order cognitive skills. Are the current large AI models indeed capable of generalized problem solving as humans do? A systematic analysis of AI capabilities for joint vision and text reasoning, however, is missing in the current scientific literature. In this paper, we make an effort towards filling this gap, by evaluating state-of-the-art LVLMs on their mathematical and algorithmic reasoning abilities using visuo-linguistic problems from children's Olympiads. Specifically, we consider problems from the Mathematical Kangaroo (MK) Olympiad, which is a popular international competition targeted at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads· slideslive

Taxonomy

TopicsEducational Assessment and Pedagogy · Mathematics Education and Teaching Techniques