Training and Agentic Inference Strategies for LLM-based Manim Animation Generation
Ravidu Suien Rammuni Silva, Ahmad Lotfi, Isibor Kennedy Ihianle, Golnaz Shahtahmassebi, Jordan J. Bird

TL;DR
This paper introduces a unified training and inference framework for LLMs to generate programmatic animations with Manim, combining fine-tuning, reinforcement learning, and agentic inference strategies to improve code quality and visual output.
Contribution
It presents the first comprehensive study of training and inference strategies for text-to-code-to-video generation with Manim, demonstrating their combined effectiveness.
Findings
SFT improves code quality in animation generation.
GRPO enhances visual output and responsiveness during inference.
Qwen 3 Coder 30B with GRPO and RITL-DOC achieved 94% Render Success Rate.
Abstract
Generating programmatic animation using libraries such as Manim presents unique challenges for Large Language Models (LLMs), requiring spatial reasoning, temporal sequencing, and familiarity with domain-specific APIs that are underrepresented in general pre-training data. A systematic study of how training and inference strategies interact in this setting is lacking in current research. This study introduces ManimTrainer, a training pipeline that combines Supervised Fine-tuning (SFT) with Reinforcement Learning (RL) based Group Relative Policy Optimisation (GRPO) using a unified reward signal that fuses code and visual assessment signals, and ManimAgent, an inference pipeline featuring Renderer-in-the-loop (RITL) and API documentation-augmented RITL (RITL-DOC) strategies. Using these techniques, this study presents the first unified training and inference study for text-to-code-to-video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
