Minimizing Live Experiments in Recommender Systems: User Simulation to Evaluate Preference Elicitation Policies
Chih-Wei Hsu, Martin Mladenov, Ofer Meshi, James Pine, Hubert Pham,, Shane Li, Xujian Liang, Anton Polishko, Li Yang, Ben Scheetz, Craig Boutilier

TL;DR
This paper presents a simulation-based approach to evaluate preference elicitation policies in recommender systems, reducing reliance on costly live experiments by using robust user behavior models and a simulation platform.
Contribution
It introduces a counterfactually robust simulation methodology for evaluating onboarding algorithms, specifically applied to YouTube Music, enhancing evaluation efficiency and reliability.
Findings
Simulation accurately predicts live deployment performance.
Reduced need for costly real-user experiments.
Effective evaluation of preference elicitation algorithms.
Abstract
Evaluation of policies in recommender systems typically involves A/B testing using live experiments on real users to assess a new policy's impact on relevant metrics. This ``gold standard'' comes at a high cost, however, in terms of cycle time, user cost, and potential user retention. In developing policies for ``onboarding'' new users, these costs can be especially problematic, since on-boarding occurs only once. In this work, we describe a simulation methodology used to augment (and reduce) the use of live experiments. We illustrate its deployment for the evaluation of ``preference elicitation'' algorithms used to onboard new users of the YouTube Music platform. By developing counterfactually robust user behavior models, and a simulation service that couples such models with production infrastructure, we are able to test new algorithms in a way that reliably predicts their performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodstravel james
