Unsupervised Audio-Visual Lecture Segmentation
Darshan Singh S, Anchit Gupta, C. V. Jawahar, Makarand Tapaswi

TL;DR
This paper introduces AVLectures, a large-scale educational dataset, and proposes an unsupervised method for segmenting lecture videos into topics using multimodal cues, improving navigation and engagement.
Contribution
The paper presents a new dataset for educational videos and a novel unsupervised segmentation method leveraging visual, textual, and OCR data.
Findings
Outperforms baseline methods on 15 courses
Effective multimodal feature matching for segmentation
Key factors identified through ablation studies
Abstract
Over the last decade, online lecture videos have become increasingly popular and have experienced a meteoric rise during the pandemic. However, video-language research has primarily focused on instructional videos or movies, and tools to help students navigate the growing online lectures are lacking. Our first contribution is to facilitate research in the educational domain, by introducing AVLectures, a large-scale dataset consisting of 86 courses with over 2,350 lectures covering various STEM subjects. Each course contains video lectures, transcripts, OCR outputs for lecture frames, and optionally lecture notes, slides, assignments, and related educational content that can inspire a variety of tasks. Our second contribution is introducing video lecture segmentation that splits lectures into bite-sized topics that show promise in improving learner engagement. We formulate lecture…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Unsupervised Audio-Visual Lecture Segmentation· youtube
Taxonomy
TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Advanced Image Processing Techniques
MethodsContrastive Language-Image Pre-training
