MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing   Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech,   OCR, and Visual Features

Katharina Anderer; Andreas Reich; Matthias W\"olfel

arXiv:2409.16765·cs.CV·September 26, 2024

MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual Features

Katharina Anderer, Andreas Reich, Matthias W\"olfel

PDF

1 Repo

TL;DR

This paper introduces MaViLS, a comprehensive benchmark dataset for video-to-slide alignment, and proposes a multimodal algorithm that combines speech, OCR, and visual features, achieving high accuracy and speed improvements over traditional methods.

Contribution

The paper presents a new multimodal alignment algorithm and a benchmark dataset, demonstrating improved accuracy and efficiency in aligning lecture videos with slides.

Findings

01

OCR features are most effective for matching accuracy.

02

The algorithm is approximately 11 times faster than SIFT.

03

Penalizing slide transitions improves alignment accuracy.

Abstract

This paper presents a benchmark dataset for aligning lecture videos with corresponding slides and introduces a novel multimodal algorithm leveraging features from speech, text, and images. It achieves an average accuracy of 0.82 in comparison to SIFT (0.56) while being approximately 11 times faster. Using dynamic programming the algorithm tries to determine the optimal slide sequence. The results show that penalizing slide transitions increases accuracy. Features obtained via optical character recognition (OCR) contribute the most to a high matching accuracy, followed by image features. The findings highlight that audio transcripts alone provide valuable information for alignment and are beneficial if OCR data is lacking. Variations in matching accuracy across different lectures highlight the challenges associated with video quality and lecture style. The novel multimodal algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

andererka/mavils
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.