Dynamic Prior Thompson Sampling for Cold-Start Exploration in Recommender Systems

Zhenyu Zhao; David Zhang; Ellie Zhao; Ehsan Saberian

arXiv:2602.00943·cs.LG·February 3, 2026

Dynamic Prior Thompson Sampling for Cold-Start Exploration in Recommender Systems

Zhenyu Zhao, David Zhang, Ellie Zhao, Ehsan Saberian

PDF

Open Access

TL;DR

This paper introduces Dynamic Prior Thompson Sampling, a method that improves cold-start exploration in recommender systems by controlling the probability of new items outperforming existing ones, leading to more efficient and predictable exploration.

Contribution

The paper presents a closed-form quadratic solution for setting priors in Thompson Sampling, enabling explicit control over exploration intensity during cold-start in large-scale recommender systems.

Findings

01

Enhanced exploration control in simulations and online experiments.

02

Improved efficiency over uniform prior baseline.

03

Effective management of cold-start bias in recommender systems.

Abstract

Cold-start exploration is a core challenge in large-scale recommender systems: new or data-sparse items must receive traffic to estimate value, but over-exploration harms users and wastes impressions. In practice, Thompson Sampling (TS) is often initialized with a uniform Beta(1,1) prior, implicitly assuming a 50% success rate for unseen items. When true base rates are far lower, this optimistic prior systematically over-allocates to weak items. The impact is amplified by batched policy updates and pipeline latency: for hours, newly launched items can remain effectively "no data," so the prior dominates allocation before feedback is incorporated. We propose Dynamic Prior Thompson Sampling, a prior design that directly controls the probability that a new arm outcompetes the incumbent winner. Our key contribution is a closed-form quadratic solution for the prior mean that enforces P(X_j >…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Advanced Bandit Algorithms Research · Consumer Market Behavior and Pricing