AI-Generated Lecture Slides for Improving Slide Element Detection and Retrieval
Suyash Maniyar, Vishvesh Trivedi, Ajoy Mondal, Anand Mishra, C.V. Jawahar

TL;DR
This paper introduces SynLecSlideGen, a synthetic lecture slide generator guided by large language models, which enhances slide element detection and retrieval by improving few-shot transfer learning on real lecture slide data.
Contribution
We propose a novel LLM-guided synthetic slide generation pipeline and create a benchmark, demonstrating synthetic data's effectiveness in improving slide understanding models.
Findings
Synthetic slides improve model performance in few-shot learning.
Pretraining on synthetic data enhances detection and retrieval accuracy.
Synthetic data reduces the need for extensive manual annotation.
Abstract
Lecture slide element detection and retrieval are key problems in slide understanding. Training effective models for these tasks often depends on extensive manual annotation. However, annotating large volumes of lecture slides for supervised training is labor intensive and requires domain expertise. To address this, we propose a large language model (LLM)-guided synthetic lecture slide generation pipeline, SynLecSlideGen, which produces high-quality, coherent and realistic slides. We also create an evaluation benchmark, namely RealSlide by manually annotating 1,050 real lecture slides. To assess the utility of our synthetic slides, we perform few-shot transfer learning on real data using models pre-trained on them. Experimental results show that few-shot transfer learning with pretraining on synthetic slides significantly improves performance compared to training only on real data. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Handwritten Text Recognition Techniques
