Navigating Speech Recording Collections with AI-Generated Illustrations

Sirina H{\aa}land; Trond Karlsen Str{\o}m; Petra Galu\v{s}\v{c}\'akov\'a

arXiv:2507.04182·cs.IR·July 8, 2025

Navigating Speech Recording Collections with AI-Generated Illustrations

Sirina H{\aa}land, Trond Karlsen Str{\o}m, Petra Galu\v{s}\v{c}\'akov\'a

PDF

TL;DR

This paper introduces a new AI-driven method for navigating large speech archives by integrating multimodal generative models to create visual and structured representations, enhancing accessibility and exploration.

Contribution

It presents a novel approach combining language and multimodal generative models for speech archive navigation, implemented in a web app with interactive mind maps and image generation.

Findings

01

Initial user tests show improved ease of exploring speech collections

02

The system effectively organizes speech data into visual structures

03

Potential to simplify large speech archive exploration

Abstract

Although the amount of available spoken content is steadily increasing, extracting information and knowledge from speech recordings remains challenging. Beyond enhancing traditional information retrieval methods such as speech search and keyword spotting, novel approaches for navigating and searching spoken content need to be explored and developed. In this paper, we propose a novel navigational method for speech archives that leverages recent advances in language and multimodal generative models. We demonstrate our approach with a Web application that organizes data into a structured format using interactive mind maps and image generation tools. The system is implemented using the TED-LIUM~3 dataset, which comprises over 2,000 speech transcripts and audio files of TED Talks. Initial user tests using a System Usability Scale (SUS) questionnaire indicate the application's potential to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.