Augmented Segmentation and Visualization for Presentation Videos
Alexander Haubold, John R. Kender

TL;DR
This paper presents a method for segmenting, visualizing, and indexing presentation videos by combining audio speaker segmentation with key phrase extraction and visual dissimilarity-based video segmentation, enhanced by an interactive interface.
Contribution
It introduces a novel integrated approach for presentation video analysis that combines audio and visual data with interactive visualization tools.
Findings
Effective segmentation of audio by speaker and key phrases.
Video segmentation based on visual dissimilarities.
Prototype interface for multi-modal presentation navigation.
Abstract
We investigate methods of segmenting, visualizing, and indexing presentation videos by separately considering audio and visual data. The audio track is segmented by speaker, and augmented with key phrases which are extracted using an Automatic Speech Recognizer (ASR). The video track is segmented by visual dissimilarities and augmented by representative key frames. An interactive user interface combines a visual representation of audio, video, text, and key frames, and allows the user to navigate a presentation video. We also explore clustering and labeling of speaker data and present preliminary results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Music and Audio Processing · Multimedia Communication and Technology
