See Before You Code: Learning Visual Priors for Spatially Aware Educational Animation Generation
Yuejia Li, Ke He, Junheng Li, Shutong Chen, Jingkang Xia, Zhiyue Su, Junchi Zhang, and Mang Ye

TL;DR
This paper introduces OmniManim, a framework for generating educational animations from natural language that ensures visual quality by incorporating render-feedback and explicit visual planning.
Contribution
It presents a novel render-feedback-aware code generation framework with visual planning, datasets, and evaluation protocols for educational animation quality.
Findings
OmniManim improves render quality over baselines on EduRequire-500.
Explicit visual planning components are crucial for quality improvements.
Constructed datasets enable comprehensive evaluation of animation generation methods.
Abstract
Large language models can generate executable code for educational animations, but the resulting renders often exhibit visual defects, including element overlap, misalignment, and broken animation continuity. These defects cannot be reliably detected from the code alone and become apparent only after execution. We formalize this problem as render-feedback-aware constrained code generation: given a natural language specification, the model must generate executable code whose rendered output satisfies structured quality criteria that can be evaluated only after rendering. To address this problem, we introduce OmniManim, a render-feedback-aware educational animation generation framework built around a shared scene state, explicit visual planning, structured post-render diagnostics, and localized repair. Within OmniManim, the Vision Agent is a task-specific visual planning module: it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
