Reward Selection with Noisy Observations
Kamyar Azizzadenesheli, Trung Dang, Aranyak Mehta, Alexandros Psomas,, Qian Zhang

TL;DR
This paper investigates the problem of selecting the best box with noisy reward observations, showing naive and linear policies can be arbitrarily bad, and proposing a threshold policy that guarantees a constant approximation under certain conditions.
Contribution
It proves the suboptimality of naive and linear policies and introduces a simple threshold policy that achieves a constant approximation to the optimal reward under specific distributional conditions.
Findings
Naive and linear policies can be arbitrarily bad compared to the optimal.
A simple threshold policy achieves a constant approximation under the small tail condition.
Without the small tail condition, even an optimal clairvoyant cannot guarantee a constant approximation.
Abstract
We study a fundamental problem in optimization under uncertainty. There are boxes; each box contains a hidden reward . Rewards are drawn i.i.d. from an unknown distribution . For each box , we see , an unbiased estimate of its reward, which is drawn from a Normal distribution with known standard deviation (and an unknown mean ). Our task is to select a single box, with the goal of maximizing our reward. This problem captures a wide range of applications, e.g. ad auctions, where the hidden reward is the click-through rate of an ad. Previous work in this model [BKMR12] proves that the naive policy, which selects the box with the largest estimate , is suboptimal, and suggests a linear policy, which selects the box with the largest , for some . However, no formal guarantees are given about the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Advanced Bandit Algorithms Research · Consumer Market Behavior and Pricing
