SketchAgent: Language-Driven Sequential Sketch Generation

Yael Vinker; Tamar Rott Shaham; Kristine Zheng; Alex Zhao; Judith E; Fan; Antonio Torralba

arXiv:2411.17673·cs.CV·November 27, 2024

SketchAgent: Language-Driven Sequential Sketch Generation

Yael Vinker, Tamar Rott Shaham, Kristine Zheng, Alex Zhao, Judith E, Fan, Antonio Torralba

PDF

Open Access

TL;DR

SketchAgent is a novel language-driven system that enables dynamic, conversational sketch creation and refinement using large language models, without requiring additional training.

Contribution

It introduces a no-training approach leveraging multimodal LLMs for sequential sketch generation through a new sketching language and string-based actions.

Findings

01

Capable of generating sketches from diverse prompts

02

Engages in dialogue-driven drawing and collaboration

03

Operates without additional training or fine-tuning

Abstract

Sketching serves as a versatile tool for externalizing ideas, enabling rapid exploration and visual communication that spans various disciplines. While artificial systems have driven substantial advances in content creation and human-computer interaction, capturing the dynamic and abstract nature of human sketching remains challenging. In this work, we introduce SketchAgent, a language-driven, sequential sketch generation method that enables users to create, modify, and refine sketches through dynamic, conversational interactions. Our approach requires no training or fine-tuning. Instead, we leverage the sequential nature and rich prior knowledge of off-the-shelf multimodal large language models (LLMs). We present an intuitive sketching language, introduced to the model through in-context examples, enabling it to "draw" using string-based actions. These are processed into vector…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Interactive and Immersive Displays · Augmented Reality Applications