Rhapsody: A Dataset for Highlight Detection in Podcasts
Younghan Park, Anuj Diwan, David Harwath, Eunsol Choi

TL;DR
This paper introduces Rhapsody, a large dataset for podcast highlight detection, and evaluates various models, revealing the difficulty of the task even for advanced language models and the benefits of fine-tuning with in-domain data.
Contribution
The paper presents Rhapsody, a novel dataset for segment-level podcast highlight detection, and provides a comprehensive evaluation of baseline models, emphasizing the challenges and potential of fine-tuning.
Findings
State-of-the-art language models struggle with highlight detection.
Fine-tuned models outperform zero-shot approaches.
Combining speech features and transcripts improves performance.
Abstract
Podcasts have become daily companions for half a billion users. Given the enormous amount of podcast content available, highlights provide a valuable signal that helps viewers get the gist of an episode and decide if they want to invest in listening to it in its entirety. However, identifying highlights automatically is challenging due to the unstructured and long-form nature of the content. We introduce Rhapsody, a dataset of 13K podcast episodes paired with segment-level highlight scores derived from YouTube's 'most replayed' feature. We frame the podcast highlight detection as a segment-level binary classification task. We explore various baseline approaches, including zero-shot prompting of language models and lightweight fine-tuned language models using segment-level classification heads. Our experimental results indicate that even state-of-the-art language models like GPT-4o and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadio, Podcasts, and Digital Media
