Beyond End-to-End Video Models: An LLM-Based Multi-Agent System for Educational Video Generation
Lingyong Yan, Jiulong Wu, Dong Xie, Weixian Shi, Deguo Xia, Jizhou Huang

TL;DR
LAVES is a hierarchical multi-agent system utilizing large language models to generate high-quality, pedagogically coherent educational videos with high procedural fidelity, reduced costs, and automated production.
Contribution
The paper introduces LAVES, a novel multi-agent framework that decomposes educational video generation into specialized agents coordinated by an orchestrator, improving fidelity and controllability over prior end-to-end models.
Findings
Achieves over one million videos per day in large-scale deployment.
Reduces production costs by over 95% compared to industry standards.
Maintains high acceptance rates for generated educational videos.
Abstract
Although recent end-to-end video generation models demonstrate impressive performance in visually oriented content creation, they remain limited in scenarios that require strict logical rigor and precise knowledge representation, such as instructional and educational media. To address this problem, we propose LAVES, a hierarchical LLM-based multi-agent system for generating high-quality instructional videos from educational problems. The LAVES formulates educational video generation as a multi-objective task that simultaneously demands correct step-by-step reasoning, pedagogically coherent narration, semantically faithful visual demonstrations, and precise audio--visual alignment. To address the limitations of prior approaches--including low procedural fidelity, high production cost, and limited controllability--LAVES decomposes the generation workflow into specialized agents…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Multimodal Machine Learning Applications · Artificial Intelligence in Games
