Teaching an Agent to Sketch One Part at a Time

Xiaodan Du; Ruize Xu; David Yunis; Yael Vinker; Greg Shakhnarovich

arXiv:2603.19500·cs.AI·April 27, 2026

Teaching an Agent to Sketch One Part at a Time

Xiaodan Du, Ruize Xu, David Yunis, Yael Vinker, Greg Shakhnarovich

PDF

1 Datasets

TL;DR

This paper introduces a method for generating vector sketches incrementally by training a multi-modal agent with a novel reinforcement learning approach, leveraging a new dataset with detailed part annotations.

Contribution

It presents a new dataset with part-level annotations and a reinforcement learning framework for controllable, interpretable sketch generation one part at a time.

Findings

01

Structured part-level data improves sketch controllability.

02

Visual feedback enhances interpretability and local editability.

03

The method achieves more controllable sketch synthesis.

Abstract

We develop a method for producing vector sketches one part at a time. To do this, we train a multi-modal language model-based agent using a novel multi-turn process-reward reinforcement learning following supervised fine-tuning. Our approach is enabled by a new dataset we call ControlSketch-Part, containing rich part-level annotations for sketches, obtained using a novel, generic automatic annotation pipeline that segments vector sketches into semantic parts and assigns paths to parts with a structured multi-stage labeling process. Our results indicate that incorporating structured part-level data and providing agent with the visual feedback through the process enables interpretable, controllable, and locally editable text-to-vector sketch generation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

duxiaodan/ControlSketch-Part
dataset· 228 dl
228 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.