4-Doodle: Text to 3D Sketches that Move!
Hao Chen, Jiaqi Wang, Yonggang Qi, Ke Li, Kaiyue Pang, Yi-Zhe Song

TL;DR
This paper introduces 4-Doodle, a training-free framework that generates dynamic, view-consistent 3D sketches from text, leveraging pretrained diffusion models and a dual-space approach for structural and motion coherence.
Contribution
The paper presents the first training-free method for text-to-3D sketch animation using dual-space diffusion, addressing multi-view consistency and expressive motion in sparse sketches.
Findings
Produces realistic and stable 3D sketch animations
Outperforms existing methods in fidelity and controllability
Enables expressive motions like flipping and articulated movement
Abstract
We present a novel task: text-to-3D sketch animation, which aims to bring freeform sketches to life in dynamic 3D space. Unlike prior works focused on photorealistic content generation, we target sparse, stylized, and view-consistent 3D vector sketches, a lightweight and interpretable medium well-suited for visual communication and prototyping. However, this task is very challenging: (i) no paired dataset exists for text and 3D (or 4D) sketches; (ii) sketches require structural abstraction that is difficult to model with conventional 3D representations like NeRFs or point clouds; and (iii) animating such sketches demands temporal coherence and multi-view consistency, which current pipelines do not address. Therefore, we propose 4-Doodle, the first training-free framework for generating dynamic 3D sketches from text. It leverages pretrained image and video diffusion models through a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
