Analysis and Visualization of Index Words from Audio Transcripts of Instructional Videos
Alexander Haubold, John R. Kender

TL;DR
This paper presents techniques for extracting, analyzing, and visualizing key textual content from low-quality instructional videos using speech recognition and filtering, enabling better understanding and indexing of course materials.
Contribution
It introduces novel visualization tools and methods for analyzing transcripts from low-quality videos, including key phrase extraction and semantic clustering, with demonstrated effectiveness on multiple courses.
Findings
Up to 98 key terms extracted per transcript
Textbook match accuracy exceeds 70%
Effective visualization of course structure and topics
Abstract
We introduce new techniques for extracting, analyzing, and visualizing textual contents from instructional videos of low production quality. Using Automatic Speech Recognition, approximate transcripts (H75% Word Error Rate) are obtained from the originally highly compressed videos of university courses, each comprising between 10 to 30 lectures. Text material in the form of books or papers that accompany the course are then used to filter meaningful phrases from the seemingly incoherent transcripts. The resulting index into the transcripts is tied together and visualized in 3 experimental graphs that help in understanding the overall course structure and provide a tool for localizing certain topics for indexing. We specifically discuss a Transcript Index Map, which graphically lays out key phrases for a course, a Textbook Chapter to Transcript Match, and finally a Lecture Transcript…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
