Topic Modeling on Podcast Short-Text Metadata
Francisco B. Valero, Marion Baranes, Elena V. Epure

TL;DR
This paper explores the use of topic modeling on podcast metadata, especially short texts, by incorporating named entities to improve topic coherence, with experiments on multiple datasets demonstrating the effectiveness of the proposed approach.
Contribution
It introduces NEiCE, a novel document representation leveraging named entities within a Non-negative Matrix Factorization framework for better topic modeling of podcast metadata.
Findings
NEiCE improves topic coherence over baselines
Experiments on datasets from Spotify, iTunes, and Deezer validate the approach
Proposed method enhances organization and navigation of podcast collections
Abstract
Podcasts have emerged as a massively consumed online content, notably due to wider accessibility of production means and scaled distribution through large streaming platforms. Categorization systems and information access technologies typically use topics as the primary way to organize or navigate podcast collections. However, annotating podcasts with topics is still quite problematic because the assigned editorial genres are broad, heterogeneous or misleading, or because of data challenges (e.g. short metadata text, noisy transcripts). Here, we assess the feasibility to discover relevant topics from podcast metadata, titles and descriptions, using topic modeling techniques for short text. We also propose a new strategy to leverage named entities (NEs), often present in podcast metadata, in a Non-negative Matrix Factorization (NMF) topic modeling framework. Our experiments on two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · FinTech, Crowdfunding, Digital Finance · Caching and Content Delivery
Methodstravel james
