PresentAgent-2: Towards Generalist Multimodal Presentation Agents

Wei Wu; Ziyang Xu; Zeyu Zhang; Yang Zhao; Hao Tang

arXiv:2605.11363·cs.CV·May 13, 2026

PresentAgent-2: Towards Generalist Multimodal Presentation Agents

Wei Wu, Ziyang Xu, Zeyu Zhang, Yang Zhao, Hao Tang

PDF

1 Repo 1 Datasets

TL;DR

PresentAgent-2 is a framework that generates multimodal presentation videos from user queries, supporting various modes like single presentation, discussion, and interaction, with a new benchmark for evaluation.

Contribution

It introduces a unified framework for query-driven, multimodal presentation video generation supporting multiple interaction modes and provides a new benchmark for evaluation.

Findings

01

Supports three presentation modes: single, discussion, interaction.

02

Generates multimodal content including text, images, GIFs, videos.

03

Establishes a multimodal presentation benchmark with diverse evaluation criteria.

Abstract

Presentation generation is moving beyond static slide creation toward end-to-end presentation video generation with research grounding, multimodal media, and interactive delivery. We introduce PresentAgent-2, an agentic framework for generating presentation videos from user queries. Given an open-ended user query and a selected presentation mode, PresentAgent-2 first summarizes the query into a focused topic and performs deep research over presentation-friendly sources to collect multimodal resources, including relevant text, images, GIFs, and videos. It then constructs presentation slides, generates mode-specific scripts, and composes slides, audio, and dynamic media into a complete presentation video. PresentAgent-2 supports three independent presentation modes within a unified framework: Single Presentation, which generates a single-speaker narrated presentation video; Discussion,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AIGeeksGroup/PresentAgent-2
github

Datasets

AIGeeksGroup/PresentEval
dataset· 419 dl
419 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.