EduStory: A Unified Framework for Pedagogically-Consistent Multi-Shot STEM Instructional Video Generation
Xinyi Wu, Jayant Teotia, Shuai Zhao, Erik Cambria

TL;DR
EduStory is a comprehensive framework designed to generate pedagogically coherent multi-shot STEM instructional videos, emphasizing knowledge consistency, structured control, and specialized evaluation metrics.
Contribution
It introduces a unified approach combining pedagogical state modeling, script-guided control, and a new benchmark for evaluating long-horizon instructional video generation in STEM.
Findings
Domain-aware state modeling reduces narrative breakdown.
Structured control improves alignment with instructional intent.
The EduVideoBench benchmark enables rigorous evaluation of pedagogical coherence.
Abstract
Long-horizon video generation has advanced in visual quality, yet existing methods still struggle to maintain knowledge consistency and coherent pedagogical narratives across multi-shot instructional videos, especially in STEM domains. To address these challenges, we propose EduStory, a unified framework for reliable instructional video generation. EduStory integrates pedagogical state modeling to track persistent knowledge states, script-guided structured control to organize multi-shot narratives, and learning-oriented evaluation metrics to assess knowledge fidelity and constraint satisfaction. To support rigorous evaluation, we further introduce EduVideoBench, a diagnostic benchmark with multi-granularity annotations, including pedagogical storyboards, shot-level semantics, and knowledge state transitions, together with baseline tasks for controllable instructional video generation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
