SlidesGen-Bench: Evaluating Slides Generation via Computational and Quantitative Metrics

Yunqiao Yang; Wenbo Li; Houxing Ren; Zimu Lu; Ke Wang; Zhiyuan Huang; Zhuofan Zong; Mingjie Zhan; Hongsheng Li

arXiv:2601.09487·cs.CL·January 15, 2026

SlidesGen-Bench: Evaluating Slides Generation via Computational and Quantitative Metrics

Yunqiao Yang, Wenbo Li, Houxing Ren, Zimu Lu, Ke Wang, Zhiyuan Huang, Zhuofan Zong, Mingjie Zhan, Hongsheng Li

PDF

Open Access 1 Datasets

TL;DR

SlidesGen-Bench introduces a comprehensive, quantitative evaluation framework for slide generation systems, emphasizing universality, reproducibility, and alignment with human preferences, to address the challenges of assessing diverse LLM-based slide creation methods.

Contribution

It presents a unified, visual domain-based benchmark with quantitative metrics and a human-aligned dataset to evaluate slide generation systems more reliably.

Findings

01

Higher correlation with human preferences than existing methods

02

Quantitative assessment across Content, Aesthetics, and Editability

03

Effective evaluation across nine slide generation systems

Abstract

The rapid evolution of Large Language Models (LLMs) has fostered diverse paradigms for automated slide generation, ranging from code-driven layouts to image-centric synthesis. However, evaluating these heterogeneous systems remains challenging, as existing protocols often struggle to provide comparable scores across architectures or rely on uncalibrated judgments. In this paper, we introduce SlidesGen-Bench, a benchmark designed to evaluate slide generation through a lens of three core principles: universality, quantification, and reliability. First, to establish a unified evaluation framework, we ground our analysis in the visual domain, treating terminal outputs as renderings to remain agnostic to the underlying generation method. Second, we propose a computational approach that quantitatively assesses slides across three distinct dimensions - Content, Aesthetics, and Editability -…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Yqy6/Slides-Align
dataset· 10k dl
10k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Digital Humanities and Scholarship