MagicAnime: A Hierarchically Annotated, Multimodal and Multitasking Dataset with Benchmarks for Cartoon Animation Generation

Shuolin Xu; Bingyuan Wang; Zeyu Cai; Fangteng Fu; Yue Ma; Tongyi Lee; Hongchuan Yu; Zeyu Wang

arXiv:2507.20368·cs.CV·July 29, 2025

MagicAnime: A Hierarchically Annotated, Multimodal and Multitasking Dataset with Benchmarks for Cartoon Animation Generation

Shuolin Xu, Bingyuan Wang, Zeyu Cai, Fangteng Fu, Yue Ma, Tongyi Lee, Hongchuan Yu, Zeyu Wang

PDF

TL;DR

MagicAnime is a large-scale, hierarchically annotated multimodal dataset with benchmarks designed to advance cartoon animation generation, addressing the scarcity of annotated cartoon data and supporting multiple video generation tasks.

Contribution

It introduces the MagicAnime dataset with extensive annotations and benchmarks, enabling research in diverse multimodal cartoon animation tasks.

Findings

01

Effective support for high-fidelity, fine-grained animation generation

02

Benchmarks facilitate comparison of different methods

03

Validation through experiments on four key tasks

Abstract

Generating high-quality cartoon animations multimodal control is challenging due to the complexity of non-human characters, stylistically diverse motions and fine-grained emotions. There is a huge domain gap between real-world videos and cartoon animation, as cartoon animation is usually abstract and has exaggerated motion. Meanwhile, public multimodal cartoon data are extremely scarce due to the difficulty of large-scale automatic annotation processes compared with real-life scenarios. To bridge this gap, We propose the MagicAnime dataset, a large-scale, hierarchically annotated, and multimodal dataset designed to support multiple video generation tasks, along with the benchmarks it includes. Containing 400k video clips for image-to-video generation, 50k pairs of video clips and keypoints for whole-body annotation, 12k pairs of video clips for video-to-video face animation, and 2.9k…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.