Preacher: Paper-to-Video Agentic System
Jingwei Liu, Ling Yang, Hao Luo, Fan Wang, Hongyan Li, Mengdi Wang

TL;DR
Preacher is a novel system that converts research papers into coherent, diverse video abstracts by decomposing, summarizing, and synthesizing key concepts with iterative planning, surpassing current models in quality and domain expertise.
Contribution
It introduces a top-down and bottom-up approach with a Progressive Chain of Thought for effective paper-to-video synthesis, addressing limitations of existing models.
Findings
Successfully generates high-quality video abstracts across five research fields.
Outperforms state-of-the-art video generation models in coherence and diversity.
Demonstrates domain-specific knowledge representation in video abstracts.
Abstract
The paper-to-video task converts a research paper into a structured video abstract, distilling key concepts, methods, and conclusions into an accessible, well-organized format. While state-of-the-art video generation models demonstrate potential, they are constrained by limited context windows, rigid video duration constraints, limited stylistic diversity, and an inability to represent domain-specific knowledge. To address these limitations, we introduce Preacher, the first paper-to-video agentic system. Preacher employs a topdown approach to decompose, summarize, and reformulate the paper, followed by bottom-up video generation, synthesizing diverse video segments into a coherent abstract. To align cross-modal representations, we define key scenes and introduce a Progressive Chain of Thought (P-CoT) for granular, iterative planning. Preacher successfully generates high-quality video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
