ShapeCodeBench: A Renewable Benchmark for Perception-to-Program Reconstruction of Synthetic Shape Scenes

Shivam Kumar

arXiv:2605.11680·cs.CV·May 13, 2026

ShapeCodeBench: A Renewable Benchmark for Perception-to-Program Reconstruction of Synthetic Shape Scenes

Shivam Kumar

PDF

1 Repo 1 Datasets

TL;DR

ShapeCodeBench is a synthetic benchmark for perception-to-program reconstruction of shape scenes, evaluating models on their ability to generate executable drawing programs from images.

Contribution

It introduces a new synthetic benchmark with a diverse set of scenes and evaluation metrics, enabling systematic assessment of perception-to-program models.

Findings

01

Classical heuristics perform well on easy scenes but fail on complex overlaps.

02

GPT-5.5 achieves the highest exact match among tested models.

03

The benchmark remains challenging, with low overall exact match scores.

Abstract

We introduce ShapeCodeBench, a synthetic benchmark for perception-to-program reconstruction: given a rendered raster image, a model must emit an executable drawing program that a deterministic evaluator re-renders and compares with the target. The v1 DSL has four primitives on a 512 x 512 black-on-white canvas, but every instance is generated from a seeded RNG, so fresh held-out sets can be created to reduce exact-instance contamination. We release a frozen eval_v1 split with 150 samples across easy, medium, and hard tiers, scored by exact match, pixel accuracy, foreground IoU, parse success, and execution success. We evaluate an empty-program floor, a classical computer-vision heuristic, Claude Opus 4.7 at high and max effort, and GPT-5.5 at medium and extra_high reasoning effort. The heuristic is competitive on easy scenes but collapses when overlaps fuse components; the strongest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shivamk3r/shape-code-bench
github

Datasets

shivamk3r/shape-code-bench-eval-v1
dataset· 112 dl
112 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.