TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding

Max Ku; Thomas Chong; Jonathan Leung; Krish Shah; Alvin Yu; Wenhu Chen

arXiv:2502.19400·cs.AI·May 27, 2025

TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding

Max Ku, Thomas Chong, Jonathan Leung, Krish Shah, Alvin Yu, Wenhu Chen

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces TheoremExplainAgent, a system for generating detailed video explanations of theorems using LLMs and animations, and evaluates it with a new benchmark and metrics, revealing the importance of multimodal explanations for understanding.

Contribution

It presents TheoremExplainAgent for multimodal theorem explanations and TheoremExplainBench for systematic evaluation, advancing the generation of pedagogically meaningful visual explanations.

Findings

01

Agentic planning improves long-form video generation.

02

The o3-mini agent achieves 93.8% success rate.

03

Videos have minor layout issues but reveal reasoning flaws.

Abstract

Understanding domain-specific theorems often requires more than just text-based reasoning; effective communication through structured visual explanations is crucial for deeper comprehension. While large language models (LLMs) demonstrate strong performance in text-based theorem reasoning, their ability to generate coherent and pedagogically meaningful visual explanations remains an open challenge. In this work, we introduce TheoremExplainAgent, an agentic approach for generating long-form theorem explanation videos (over 5 minutes) using Manim animations. To systematically evaluate multimodal theorem explanations, we propose TheoremExplainBench, a benchmark covering 240 theorems across multiple STEM disciplines, along with 5 automated evaluation metrics. Our results reveal that agentic planning is essential for generating detailed long-form videos, and the o3-mini agent achieves a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

TIGER-Lab/TheoremExplainBench
dataset· 23 dl
23 dl

Videos

TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding· underline

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Software Engineering Research