Augmented Segmentation and Visualization for Presentation Videos

Alexander Haubold; John R. Kender

arXiv:cs/0501044·cs.MM·May 23, 2007

Augmented Segmentation and Visualization for Presentation Videos

Alexander Haubold, John R. Kender

PDF

Open Access

TL;DR

This paper presents a method for segmenting, visualizing, and indexing presentation videos by combining audio speaker segmentation with key phrase extraction and visual dissimilarity-based video segmentation, enhanced by an interactive interface.

Contribution

It introduces a novel integrated approach for presentation video analysis that combines audio and visual data with interactive visualization tools.

Findings

01

Effective segmentation of audio by speaker and key phrases.

02

Video segmentation based on visual dissimilarities.

03

Prototype interface for multi-modal presentation navigation.

Abstract

We investigate methods of segmenting, visualizing, and indexing presentation videos by separately considering audio and visual data. The audio track is segmented by speaker, and augmented with key phrases which are extracted using an Automatic Speech Recognizer (ASR). The video track is segmented by visual dissimilarities and augmented by representative key frames. An interactive user interface combines a visual representation of audio, video, text, and key frames, and allows the user to navigate a presentation video. We also explore clustering and labeling of speaker data and present preliminary results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Music and Audio Processing · Multimedia Communication and Technology