AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics
Tencent HY Team

TL;DR
AniMatrix is a novel anime video generation model that emphasizes artistic expression over physical realism by integrating structured production knowledge and a specialized training curriculum.
Contribution
It introduces a dual-channel conditioning mechanism and a three-step transition to prioritize artistic correctness, overcoming the limitations of physics-biased models.
Findings
Ranks first on four out of five production dimensions in human evaluation
Achieves significant improvements in Prompt Understanding and Artistic Motion
Uses a structured taxonomy and domain-specific reward model for better control
Abstract
Video generation models internalize physical realism as their prior. Anime deliberately violates physics: smears, impact frames, chibi shifts; and its thousands of coexisting artistic conventions yield no single "physics of anime" a model can absorb. Physics-biased models therefore flatten the artistry that defines the medium or collapse under its stylistic variance. We present AniMatrix, a video generation model that targets artistic rather than physical correctness through a dual-channel conditioning mechanism and a three-step transition: redefine correctness, override the physics prior, and distinguish art from failure. First, a Production Knowledge System encodes anime as a structured taxonomy of controllable production variables (Style, Motion, Camera, VFX), and AniCaption infers these variables from pixels as directorial directives. A trainable tag encoder preserves the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
