PreGenie: An Agentic Framework for High-quality Visual Presentation Generation

Xiaojie Xu; Xinli Xu; Sirui Chen; Haoyu Chen; Fan Zhang; Ying-Cong Chen

arXiv:2505.21660·cs.LG·September 3, 2025

PreGenie: An Agentic Framework for High-quality Visual Presentation Generation

Xiaojie Xu, Xinli Xu, Sirui Chen, Haoyu Chen, Fan Zhang, Ying-Cong Chen

PDF

Open Access 1 Video

TL;DR

PreGenie is a modular, agentic framework utilizing multimodal large language models to generate high-quality, well-organized visual presentations through iterative analysis, generation, and review, improving on previous deep learning approaches.

Contribution

It introduces PreGenie, a novel two-stage, multimodal LLM-based framework that enhances visual presentation quality and coherence, addressing prior limitations in layout, text, and image understanding.

Findings

01

Outperforms existing models in aesthetics and content consistency

02

Demonstrates superior multimodal understanding capabilities

03

Produces presentations that align closely with human design preferences

Abstract

Visual presentations are vital for effective communication. Early attempts to automate their creation using deep learning often faced issues such as poorly organized layouts, inaccurate text summarization, and a lack of image understanding, leading to mismatched visuals and text. These limitations restrict their application in formal contexts like business and scientific research. To address these challenges, we propose PreGenie, an agentic and modular framework powered by multimodal large language models (MLLMs) for generating high-quality visual presentations. PreGenie is built on the Slidev presentation framework, where slides are rendered from Markdown code. It operates in two stages: (1) Analysis and Initial Generation, which summarizes multimodal input and generates initial code, and (2) Review and Re-generation, which iteratively reviews intermediate code and rendered slides to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

PreGenie: An Agentic Framework for High-quality Visual Presentation Generation· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Data Visualization and Analytics · Generative Adversarial Networks and Image Synthesis