Prior Ordering and Monotonicity in Dirichlet Bandits
Yaming Yu

TL;DR
This paper investigates how the expected payoff and optimal strategies in Dirichlet bandit problems change with prior information, revealing monotonic relationships and extending classical results in Bayesian bandit theory.
Contribution
It establishes new monotonicity properties of the maximum expected payoff with respect to Dirichlet process priors, settling a conjecture and extending previous work on Bernoulli bandits.
Findings
Expected payoff increases with larger prior means.
Expected payoff decreases with higher prior weights for fixed mean.
Results generalize classical bandit theory to Dirichlet process priors.
Abstract
One of two independent stochastic processes (arms) are to be selected at each of n stages. The selection is sequential and depends on past observations as well as the prior information. Observations from arm i are independent given a distribution P_i, and, following Clayton and Berry (1985), P_i's have independent Dirichlet process priors. The objective is to maximize the expected future-discounted sum of the n observations. We study structural properties of the bandit, in particular how the maximum expected payoff and the optimal strategy vary with the Dirichlet process priors. The main results are (i) for a particular arm and a fixed prior weight, the maximum expected payoff increases as the mean of the Dirichlet process prior becomes larger in the increasing convex order; (ii) for a fixed prior mean, the maximum expected payoff decreases as the prior weight increases. Specializing to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Recommender Systems and Techniques
