SlideGen: Collaborative Multimodal Agents for Scientific Slide Generation
Xin Liang, Xiang Zhang, Yiwei Xu, Siqi Sun, Chenyu You

TL;DR
SlideGen is a modular, multimodal system that uses collaborative vision-language agents to generate high-quality, visually appealing scientific presentation slides from papers, surpassing existing methods in quality and faithfulness.
Contribution
We introduce SlideGen, a novel agent-based framework that integrates multimodal reasoning and visual planning for automated scientific slide generation.
Findings
Outperforms existing methods in visual quality and content faithfulness
Produces slides with logical flow and expert-level visual presentation
Establishes a new state-of-the-art in automated slide creation
Abstract
Generating academic slides from scientific papers is a challenging multimodal reasoning task that requires both long context understanding and deliberate visual planning. Existing approaches largely reduce it to text only summarization, overlooking the visual component and design intensive nature of slide creation. In this paper we introduce SlideGen, an agentic, modular, and visual in the loop framework for scientific paper to slide generation. SlideGen orchestrates a group of vision language agents that reason collaboratively over the document structure and semantics, producing editable PPTX slides with logical flow and compelling visual presentation. By integrating coordinated outlining, mapping, arrangement, note synthesis, and iterative refinement, our system consistently delivers slides of expert level quality. Across diverse benchmarks and strong baselines, SlideGen outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Artificial Intelligence in Games
