Reward Selection with Noisy Observations

Kamyar Azizzadenesheli; Trung Dang; Aranyak Mehta; Alexandros Psomas,; Qian Zhang

arXiv:2307.05953·cs.GT·July 13, 2023

Reward Selection with Noisy Observations

Kamyar Azizzadenesheli, Trung Dang, Aranyak Mehta, Alexandros Psomas,, Qian Zhang

PDF

Open Access

TL;DR

This paper investigates the problem of selecting the best box with noisy reward observations, showing naive and linear policies can be arbitrarily bad, and proposing a threshold policy that guarantees a constant approximation under certain conditions.

Contribution

It proves the suboptimality of naive and linear policies and introduces a simple threshold policy that achieves a constant approximation to the optimal reward under specific distributional conditions.

Findings

01

Naive and linear policies can be arbitrarily bad compared to the optimal.

02

A simple threshold policy achieves a constant approximation under the small tail condition.

03

Without the small tail condition, even an optimal clairvoyant cannot guarantee a constant approximation.

Abstract

We study a fundamental problem in optimization under uncertainty. There are $n$ boxes; each box $i$ contains a hidden reward $x_{i}$ . Rewards are drawn i.i.d. from an unknown distribution $D$ . For each box $i$ , we see $y_{i}$ , an unbiased estimate of its reward, which is drawn from a Normal distribution with known standard deviation $σ_{i}$ (and an unknown mean $x_{i}$ ). Our task is to select a single box, with the goal of maximizing our reward. This problem captures a wide range of applications, e.g. ad auctions, where the hidden reward is the click-through rate of an ad. Previous work in this model [BKMR12] proves that the naive policy, which selects the box with the largest estimate $y_{i}$ , is suboptimal, and suggests a linear policy, which selects the box $i$ with the largest $y_{i} - c \cdot σ_{i}$ , for some $c > 0$ . However, no formal guarantees are given about the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuction Theory and Applications · Advanced Bandit Algorithms Research · Consumer Market Behavior and Pricing