VectorGym: A Multitask Benchmark for SVG Code Generation, Sketching, and Editing

Juan Rodriguez; Haotian Zhang; Abhay Puri; Tianyang Zhang; Rishav Pramanik; Meng Lin; Xiaoqing Xie; Marco Terral; Darsh Kaushik; Aly Shariff; Perouz Taslakian; Spandana Gella; Sai Rajeswar; David Vazquez; Christopher Pal; Marco Pedersoli

arXiv:2603.29852·cs.GR·April 1, 2026

VectorGym: A Multitask Benchmark for SVG Code Generation, Sketching, and Editing

Juan Rodriguez, Haotian Zhang, Abhay Puri, Tianyang Zhang, Rishav Pramanik, Meng Lin, Xiaoqing Xie, Marco Terral, Darsh Kaushik, Aly Shariff, Perouz Taslakian, Spandana Gella, Sai Rajeswar, David Vazquez, Christopher Pal, Marco Pedersoli

PDF

1 Repo

TL;DR

VectorGym is a new comprehensive benchmark suite for SVG code generation, editing, and understanding, featuring four tasks with human annotations and a multi-task RL approach that achieves state-of-the-art results.

Contribution

It introduces four challenging SVG tasks with human annotations and a multi-task RL training method that outperforms larger models and establishes new benchmarks.

Findings

01

Qwen3-VL 8B achieves state-of-the-art performance among open-source models.

02

The VLM-as-a-Judge metric correlates well with human judgments.

03

Significant performance gaps remain for current models on SVG understanding tasks.

Abstract

We introduce VectorGym, a comprehensive benchmark suite for Scalable Vector Graphics (SVG) that spans generation from text and sketches, complex editing, and visual understanding. VectorGym addresses the lack of realistic, challenging benchmarks aligned with professional design workflows. Our benchmark comprises four tasks with expert human-authored annotations: the novel Sketch2SVG task (VG-Sketch); a new SVG editing dataset (VG-Edit) featuring complex, multi-step edits with higher-order primitives; Text2SVG generation (VG-Text); and SVG captioning (VG-Cap). Unlike prior benchmarks that rely on synthetic edits, VectorGym provides gold-standard human annotations that require semantic understanding and design intent. We also propose a multi-task reinforcement learning approach that jointly optimizes across all four tasks using rendering-based rewards. Our method, built on GRPO with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.