Unsupervised Audio-Visual Lecture Segmentation

Darshan Singh S; Anchit Gupta; C. V. Jawahar; Makarand Tapaswi

arXiv:2210.16644·cs.CV·November 1, 2022·1 cites

Unsupervised Audio-Visual Lecture Segmentation

Darshan Singh S, Anchit Gupta, C. V. Jawahar, Makarand Tapaswi

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces AVLectures, a large-scale educational dataset, and proposes an unsupervised method for segmenting lecture videos into topics using multimodal cues, improving navigation and engagement.

Contribution

The paper presents a new dataset for educational videos and a novel unsupervised segmentation method leveraging visual, textual, and OCR data.

Findings

01

Outperforms baseline methods on 15 courses

02

Effective multimodal feature matching for segmentation

03

Key factors identified through ablation studies

Abstract

Over the last decade, online lecture videos have become increasingly popular and have experienced a meteoric rise during the pandemic. However, video-language research has primarily focused on instructional videos or movies, and tools to help students navigate the growing online lectures are lacking. Our first contribution is to facilitate research in the educational domain, by introducing AVLectures, a large-scale dataset consisting of 86 courses with over 2,350 lectures covering various STEM subjects. Each course contains video lectures, transcripts, OCR outputs for lecture frames, and optionally lecture notes, slides, assignments, and related educational content that can inspire a variety of tasks. Our second contribution is introducing video lecture segmentation that splits lectures into bite-sized topics that show promise in improving learner engagement. We formulate lecture…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Darshansingh11/AVLectures
pytorch

Videos

Unsupervised Audio-Visual Lecture Segmentation· youtube

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Advanced Image Processing Techniques

MethodsContrastive Language-Image Pre-training