Optimal Policies for the Homogeneous Selective Labels Problem
Dennis Wei

TL;DR
This paper investigates optimal decision policies under selective labels, showing that for discounted rewards the optimal policy is a threshold, while for average rewards it involves positive acceptance probabilities, advancing understanding in this decision-making context.
Contribution
It characterizes the structure of optimal policies in homogeneous selective labels problems, revealing threshold and acceptance probability policies for different reward criteria.
Findings
Optimal policy for discounted reward is a threshold policy.
For average reward, optimal policies have positive acceptance probability in all states.
The problem is formulated as an optimal stopping problem.
Abstract
Selective labels are a common feature of consequential decision-making applications, referring to the lack of observed outcomes under one of the possible decisions. This paper reports work in progress on learning decision policies in the face of selective labels. The setting considered is both a simplified homogeneous one, disregarding individuals' features to facilitate determination of optimal policies, and an online one, to balance costs incurred in learning with future utility. For maximizing discounted total reward, the optimal policy is shown to be a threshold policy, and the problem is one of optimal stopping. In contrast, for undiscounted infinite-horizon average reward, optimal policies have positive acceptance probability in all states. Future work stemming from these results is discussed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Optimization and Search Problems
