PreGenie: An Agentic Framework for High-quality Visual Presentation Generation
Xiaojie Xu, Xinli Xu, Sirui Chen, Haoyu Chen, Fan Zhang, Ying-Cong Chen

TL;DR
PreGenie is a modular, agentic framework utilizing multimodal large language models to generate high-quality, well-organized visual presentations through iterative analysis, generation, and review, improving on previous deep learning approaches.
Contribution
It introduces PreGenie, a novel two-stage, multimodal LLM-based framework that enhances visual presentation quality and coherence, addressing prior limitations in layout, text, and image understanding.
Findings
Outperforms existing models in aesthetics and content consistency
Demonstrates superior multimodal understanding capabilities
Produces presentations that align closely with human design preferences
Abstract
Visual presentations are vital for effective communication. Early attempts to automate their creation using deep learning often faced issues such as poorly organized layouts, inaccurate text summarization, and a lack of image understanding, leading to mismatched visuals and text. These limitations restrict their application in formal contexts like business and scientific research. To address these challenges, we propose PreGenie, an agentic and modular framework powered by multimodal large language models (MLLMs) for generating high-quality visual presentations. PreGenie is built on the Slidev presentation framework, where slides are rendered from Markdown code. It operates in two stages: (1) Analysis and Initial Generation, which summarizes multimodal input and generates initial code, and (2) Review and Re-generation, which iteratively reviews intermediate code and rendered slides to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Data Visualization and Analytics · Generative Adversarial Networks and Image Synthesis
