Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints

Gaurav Rai; Ojaswa Sharma

arXiv:2411.19381·cs.CV·February 27, 2026

Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints

Gaurav Rai, Ojaswa Sharma

PDF

Open Access

TL;DR

This paper introduces a novel method for animating hand-drawn sketches from text prompts, ensuring temporal consistency and shape preservation using diffusion models and specialized regularizations.

Contribution

It proposes a new approach combining a pre-trained text-to-video diffusion model with LA and ARAP regularizations for improved sketch animation.

Findings

01

Outperforms state-of-the-art in quantitative metrics

02

Achieves more accurate and consistent sketch animations

03

Preserves sketch topology effectively

Abstract

Animating hand-drawn sketches using traditional tools is challenging and complex. Sketches provide a visual basis for explanations, and animating these sketches offers an experience of real-time scenarios. We propose an approach for animating a given input sketch based on a descriptive text prompt. Our method utilizes a parametric representation of the sketch's strokes. Unlike previous methods, which struggle to estimate smooth and accurate motion and often fail to preserve the sketch's topology, we leverage a pre-trained text-to-video diffusion model with SDS loss to guide the motion of the sketch's strokes. We introduce length-area (LA) regularization to ensure temporal consistency by accurately estimating the smooth displacement of control points across the frame sequence. Additionally, to preserve shape and avoid topology changes, we apply a shape-preserving As-Rigid-As-Possible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Music and Audio Processing

MethodsDiffusion