Sakuga-42M Dataset: Scaling Up Cartoon Research

Zhenglin Pan

arXiv:2405.07425·cs.CV·May 24, 2024·1 cites

Sakuga-42M Dataset: Scaling Up Cartoon Research

Zhenglin Pan

PDF

Open Access

TL;DR

This paper introduces Sakuga-42M, the largest cartoon animation dataset with 42 million keyframes, enabling improved understanding and generation of cartoons through large-scale model fine-tuning.

Contribution

The creation of Sakuga-42M, a comprehensive large-scale cartoon dataset with extensive annotations, and demonstrating its effectiveness in enhancing cartoon comprehension and generation tasks.

Findings

01

Fine-tuning foundation models on Sakuga-42M improves cartoon task performance.

02

The dataset covers diverse artistic styles, regions, and years.

03

Large-scale cartoon data benefits model generalization and robustness.

Abstract

Hand-drawn cartoon animation employs sketches and flat-color segments to create the illusion of motion. While recent advancements like CLIP, SVD, and Sora show impressive results in understanding and generating natural video by scaling large models with extensive datasets, they are not as effective for cartoons. Through our empirical experiments, we argue that this ineffectiveness stems from a notable bias in hand-drawn cartoons that diverges from the distribution of natural videos. Can we harness the success of the scaling paradigm to benefit cartoon research? Unfortunately, until now, there has not been a sizable cartoon dataset available for exploration. In this research, we propose the Sakuga-42M Dataset, the first large-scale cartoon animation dataset. Sakuga-42M comprises 42 million keyframes covering various artistic styles, regions, and years, with comprehensive semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComics and Graphic Narratives

MethodsContrastive Language-Image Pre-training