SketchAgent: Language-Driven Sequential Sketch Generation
Yael Vinker, Tamar Rott Shaham, Kristine Zheng, Alex Zhao, Judith E, Fan, Antonio Torralba

TL;DR
SketchAgent is a novel language-driven system that enables dynamic, conversational sketch creation and refinement using large language models, without requiring additional training.
Contribution
It introduces a no-training approach leveraging multimodal LLMs for sequential sketch generation through a new sketching language and string-based actions.
Findings
Capable of generating sketches from diverse prompts
Engages in dialogue-driven drawing and collaboration
Operates without additional training or fine-tuning
Abstract
Sketching serves as a versatile tool for externalizing ideas, enabling rapid exploration and visual communication that spans various disciplines. While artificial systems have driven substantial advances in content creation and human-computer interaction, capturing the dynamic and abstract nature of human sketching remains challenging. In this work, we introduce SketchAgent, a language-driven, sequential sketch generation method that enables users to create, modify, and refine sketches through dynamic, conversational interactions. Our approach requires no training or fine-tuning. Instead, we leverage the sequential nature and rich prior knowledge of off-the-shelf multimodal large language models (LLMs). We present an intuitive sketching language, introduced to the model through in-context examples, enabling it to "draw" using string-based actions. These are processed into vector…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Interactive and Immersive Displays · Augmented Reality Applications
