Evaluating Podcast Recommendations with Profile-Aware LLM-as-a-Judge
Francesco Fabbri, Gustavo Penha, Edoardo D'Amico, Alice Wang, Marco De Nadai, Jackie Doremus, Paul Gigioli, Andreas Damianou, Oskar Stal, and Mounia Lalmas

TL;DR
This paper introduces a scalable, interpretable framework using Large Language Models as offline judges to evaluate personalized podcast recommendations based on user profiles derived from listening history.
Contribution
It presents a novel profile-aware approach that constructs natural-language user profiles to enable LLMs to assess recommendation quality effectively.
Findings
LLMs as judges match human judgments with high fidelity.
Profile-based evaluation outperforms raw data-based methods.
Framework supports efficient iterative testing in recommender systems.
Abstract
Evaluating personalized recommendations remains a central challenge, especially in long-form audio domains like podcasts, where traditional offline metrics suffer from exposure bias and online methods such as A/B testing are costly and operationally constrained. In this paper, we propose a novel framework that leverages Large Language Models (LLMs) as offline judges to assess the quality of podcast recommendations in a scalable and interpretable manner. Our two-stage profile-aware approach first constructs natural-language user profiles distilled from 90 days of listening history. These profiles summarize both topical interests and behavioral patterns, serving as compact, interpretable representations of user preferences. Rather than prompting the LLM with raw data, we use these profiles to provide high-level, semantically rich context-enabling the LLM to reason more effectively about…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
