Vision2Code: A Multi-Domain Benchmark for Evaluating Image-to-Code Generation

Ajay Vikram Periasami; Junlin Wang; Bhuwan Dhingra

arXiv:2605.11307·cs.CV·May 13, 2026

Vision2Code: A Multi-Domain Benchmark for Evaluating Image-to-Code Generation

Ajay Vikram Periasami, Junlin Wang, Bhuwan Dhingra

PDF

1 Repo

TL;DR

Vision2Code introduces a comprehensive, multi-domain benchmark for evaluating image-to-code generation models without relying on reference code, emphasizing domain-specific accuracy and human-aligned evaluation.

Contribution

It provides a new reference-code-free benchmark with diverse datasets, a novel evaluation framework, and insights into domain-dependent model performance and training improvements.

Findings

01

Models perform well on charts and graphs but poorly on spatial scenes and diagrams.

02

Evaluation aligns better with human judgment than previous methods.

03

Training with filtered outputs improves model performance on the benchmark.

Abstract

Image-to-code generation tests whether a vision-language model (VLM) can recover the structure of an image enough to express it as executable code. Existing benchmarks either focus on narrow visual domains, depend on paired executable reference code, or rely on generic rubrics that miss domain-specific reconstruction errors. We introduce Vision2Code, a reference-code-free benchmark and evaluation framework for multi-domain image-to-code generation. Vision2Code contains 2,169 test examples from 15 source datasets that span charts and plots, geometry, graphs, scientific imagery, documents, and 3D spatial scenes. Models generate executable programs, which we render and score against the source image using a VLM rater with dataset-specific rubrics and deterministic guardrails for severe semantic failures. We report render-success diagnostics that separate code execution failures from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://image2code.github.io/vision2code
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.