Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints
Gaurav Rai, Ojaswa Sharma

TL;DR
This paper introduces a novel method for animating hand-drawn sketches from text prompts, ensuring temporal consistency and shape preservation using diffusion models and specialized regularizations.
Contribution
It proposes a new approach combining a pre-trained text-to-video diffusion model with LA and ARAP regularizations for improved sketch animation.
Findings
Outperforms state-of-the-art in quantitative metrics
Achieves more accurate and consistent sketch animations
Preserves sketch topology effectively
Abstract
Animating hand-drawn sketches using traditional tools is challenging and complex. Sketches provide a visual basis for explanations, and animating these sketches offers an experience of real-time scenarios. We propose an approach for animating a given input sketch based on a descriptive text prompt. Our method utilizes a parametric representation of the sketch's strokes. Unlike previous methods, which struggle to estimate smooth and accurate motion and often fail to preserve the sketch's topology, we leverage a pre-trained text-to-video diffusion model with SDS loss to guide the motion of the sketch's strokes. We introduce length-area (LA) regularization to ensure temporal consistency by accurately estimating the smooth displacement of control points across the frame sequence. Additionally, to preserve shape and avoid topology changes, we apply a shape-preserving As-Rigid-As-Possible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Music and Audio Processing
MethodsDiffusion
