Enhancing Presentation Slide Generation by LLMs with a Multi-Staged   End-to-End Approach

Sambaran Bandyopadhyay; Himanshu Maheshwari; Anandhavelu Natarajan,; Apoorv Saxena

arXiv:2406.06556·cs.CL·June 12, 2024·1 cites

Enhancing Presentation Slide Generation by LLMs with a Multi-Staged End-to-End Approach

Sambaran Bandyopadhyay, Himanshu Maheshwari, Anandhavelu Natarajan,, Apoorv Saxena

PDF

Open Access

TL;DR

This paper introduces a multi-staged end-to-end model combining large language and vision-language models to improve automatic presentation slide generation from long documents, outperforming existing prompt-based methods.

Contribution

The paper presents a novel multi-staged approach integrating LLMs and VLMs for better slide generation, addressing limitations of previous semi-automatic and flat summarization methods.

Findings

01

Outperforms state-of-the-art prompting methods in automated metrics

02

Receives higher scores in human evaluation

03

Effectively incorporates multimodal elements into slides

Abstract

Generating presentation slides from a long document with multimodal elements such as text and images is an important task. This is time consuming and needs domain expertise if done manually. Existing approaches for generating a rich presentation from a document are often semi-automatic or only put a flat summary into the slides ignoring the importance of a good narrative. In this paper, we address this research gap by proposing a multi-staged end-to-end model which uses a combination of LLM and VLM. We have experimentally shown that compared to applying LLMs directly with state-of-the-art prompting, our proposed multi-staged solution is better in terms of automated metrics and human evaluation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Video Analysis and Summarization · Multimedia Communication and Technology