VideoSketcher: Video Models Prior Enable Versatile Sequential Sketch Generation

Hui Ren; Yuval Alaluf; Omer Bar Tal; Alexander Schwing; Antonio Torralba; Yael Vinker

arXiv:2602.15819·cs.CV·February 18, 2026

VideoSketcher: Video Models Prior Enable Versatile Sequential Sketch Generation

Hui Ren, Yuval Alaluf, Omer Bar Tal, Alexander Schwing, Antonio Torralba, Yael Vinker

PDF

Open Access

TL;DR

This paper introduces a data-efficient method for generating sequential sketches by adapting pretrained text-to-video diffusion models, combining semantic stroke planning from language models with high-quality visual rendering.

Contribution

It presents a novel two-stage fine-tuning approach that leverages limited human sketch data and synthetic shapes to produce controllable, high-quality sequential sketches guided by text instructions.

Findings

01

High-quality sketches closely follow text-specified orderings

02

Method works with as few as seven human-drawn sketches

03

Extensions enable style conditioning and interactive drawing

Abstract

Sketching is inherently a sequential process, in which strokes are drawn in a meaningful order to explore and refine ideas. However, most generative models treat sketches as static images, overlooking the temporal structure that underlies creative drawing. We present a data-efficient approach for sequential sketch generation that adapts pretrained text-to-video diffusion models to generate sketching processes. Our key insight is that large language models and video diffusion models offer complementary strengths for this task: LLMs provide semantic planning and stroke ordering, while video diffusion models serve as strong renderers that produce high-quality, temporally coherent visuals. We leverage this by representing sketches as short videos in which strokes are progressively drawn on a blank canvas, guided by text-specified ordering instructions. We introduce a two-stage fine-tuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Interactive and Immersive Displays · 3D Shape Modeling and Analysis