Teaching an Agent to Sketch One Part at a Time
Xiaodan Du, Ruize Xu, David Yunis, Yael Vinker, Greg Shakhnarovich

TL;DR
This paper introduces a method for generating vector sketches incrementally by training a multi-modal agent with a novel reinforcement learning approach, leveraging a new dataset with detailed part annotations.
Contribution
It presents a new dataset with part-level annotations and a reinforcement learning framework for controllable, interpretable sketch generation one part at a time.
Findings
Structured part-level data improves sketch controllability.
Visual feedback enhances interpretability and local editability.
The method achieves more controllable sketch synthesis.
Abstract
We develop a method for producing vector sketches one part at a time. To do this, we train a multi-modal language model-based agent using a novel multi-turn process-reward reinforcement learning following supervised fine-tuning. Our approach is enabled by a new dataset we call ControlSketch-Part, containing rich part-level annotations for sketches, obtained using a novel, generic automatic annotation pipeline that segments vector sketches into semantic parts and assigns paths to parts with a structured multi-stage labeling process. Our results indicate that incorporating structured part-level data and providing agent with the visual feedback through the process enables interpretable, controllable, and locally editable text-to-vector sketch generation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
