TurtleBench: A Visual Programming Benchmark in Turtle Geometry

Sina Rismanchian; Yasaman Razeghi; Sameer Singh; Shayan Doroudi

arXiv:2411.00264·cs.AI·April 15, 2025

TurtleBench: A Visual Programming Benchmark in Turtle Geometry

Sina Rismanchian, Yasaman Razeghi, Sameer Singh, Shayan Doroudi

PDF

Open Access 1 Repo 1 Video

TL;DR

TurtleBench is a new benchmark designed to evaluate large multimodal models' ability to interpret geometric patterns and generate code, revealing significant gaps in current AI capabilities compared to human understanding.

Contribution

The paper introduces TurtleBench, a novel benchmark based on turtle geometry to assess LMMs' visual reasoning and code generation in geometric tasks.

Findings

01

Leading LMMs perform poorly on TurtleBench tasks.

02

GPT-4o achieves only 19% accuracy on simple tasks.

03

Few-shot prompting yields less than 2% improvement.

Abstract

Humans have the ability to reason about geometric patterns in images and scenes from a young age. However, developing large multimodal models (LMMs) capable of similar reasoning remains a challenge, highlighting the need for robust evaluation methods to assess these capabilities. We introduce \Turtle, a benchmark designed to evaluate LMMs' capacity to interpret geometric patterns -- given visual examples, textual instructions, or both -- and generate precise code outputs. Inspired by turtle geometry, a notion used to teach children foundational coding and geometric concepts, TurtleBench features tasks with patterned shapes that have underlying algorithmic logic. Our evaluation reveals that leading LMMs struggle significantly with these tasks, with GPT-4o achieving only 19\% accuracy on the simplest tasks and few-shot prompting only marginally improves their performance ( $< 2%$ ). \Turtle…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sinaris76/turtlebench
noneOfficial

Videos

TurtleBench: A Visual Programming Benchmark in Turtle Geometry· underline

Taxonomy

TopicsData Visualization and Analytics · Artificial Intelligence in Games