Training and Agentic Inference Strategies for LLM-based Manim Animation Generation

Ravidu Suien Rammuni Silva; Ahmad Lotfi; Isibor Kennedy Ihianle; Golnaz Shahtahmassebi; Jordan J. Bird

arXiv:2604.18364·cs.AI·April 21, 2026

Training and Agentic Inference Strategies for LLM-based Manim Animation Generation

Ravidu Suien Rammuni Silva, Ahmad Lotfi, Isibor Kennedy Ihianle, Golnaz Shahtahmassebi, Jordan J. Bird

PDF

TL;DR

This paper introduces a unified training and inference framework for LLMs to generate programmatic animations with Manim, combining fine-tuning, reinforcement learning, and agentic inference strategies to improve code quality and visual output.

Contribution

It presents the first comprehensive study of training and inference strategies for text-to-code-to-video generation with Manim, demonstrating their combined effectiveness.

Findings

01

SFT improves code quality in animation generation.

02

GRPO enhances visual output and responsiveness during inference.

03

Qwen 3 Coder 30B with GRPO and RITL-DOC achieved 94% Render Success Rate.

Abstract

Generating programmatic animation using libraries such as Manim presents unique challenges for Large Language Models (LLMs), requiring spatial reasoning, temporal sequencing, and familiarity with domain-specific APIs that are underrepresented in general pre-training data. A systematic study of how training and inference strategies interact in this setting is lacking in current research. This study introduces ManimTrainer, a training pipeline that combines Supervised Fine-tuning (SFT) with Reinforcement Learning (RL) based Group Relative Policy Optimisation (GRPO) using a unified reward signal that fuses code and visual assessment signals, and ManimAgent, an inference pipeline featuring Renderer-in-the-loop (RITL) and API documentation-augmented RITL (RITL-DOC) strategies. Using these techniques, this study presents the first unified training and inference study for text-to-code-to-video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.