PODTILE: Facilitating Podcast Episode Browsing with Auto-generated   Chapters

Azin Ghazimatin; Ekaterina Garmash; Gustavo Penha; Kristen Sheets,; Martin Achenbach; Oguz Semerci; Remi Galvez; Marcus Tannenberg; Sahitya; Mantravadi; Divya Narayanan; Ofeliya Kalaydzhyan; Douglas Cole; Ben; Carterette; Ann Clifton; Paul N. Bennett; Claudia Hauff; Mounia Lalmas

arXiv:2410.16148·cs.IR·October 22, 2024

PODTILE: Facilitating Podcast Episode Browsing with Auto-generated Chapters

Azin Ghazimatin, Ekaterina Garmash, Gustavo Penha, Kristen Sheets,, Martin Achenbach, Oguz Semerci, Remi Galvez, Marcus Tannenberg, Sahitya, Mantravadi, Divya Narayanan, Ofeliya Kalaydzhyan, Douglas Cole, Ben, Carterette, Ann Clifton, Paul N. Bennett, Claudia Hauff, Mounia Lalmas

PDF

TL;DR

PODTILE is a transformer-based model that automatically segments podcast transcripts into coherent chapters with titles, improving navigation and searchability for long-form audio content.

Contribution

We introduce PODTILE, a novel encoder-decoder transformer that generates chapter segments and titles from lengthy, unstructured podcast transcripts, addressing scalability and context preservation challenges.

Findings

01

11% ROUGE score improvement over baseline

02

Auto-generated chapters aid listener navigation

03

Chapter titles enhance search retrieval effectiveness

Abstract

Listeners of long-form talk-audio content, such as podcast episodes, often find it challenging to understand the overall structure and locate relevant sections. A practical solution is to divide episodes into chapters--semantically coherent segments labeled with titles and timestamps. Since most episodes on our platform at Spotify currently lack creator-provided chapters, automating the creation of chapters is essential. Scaling the chapterization of podcast episodes presents unique challenges. First, episodes tend to be less structured than written texts, featuring spontaneous discussions with nuanced transitions. Second, the transcripts are usually lengthy, averaging about 16,000 tokens, which necessitates efficient processing that can preserve context. To address these challenges, we introduce PODTILE, a fine-tuned encoder-decoder transformer to segment conversational data. The model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.