Finite-time Analysis for the Knowledge-Gradient Policy
Yingfei Wang, Warren Powell

TL;DR
This paper provides a finite-time analysis of the knowledge-gradient policy in sequential decision problems, introducing new theoretical bounds and insights into its performance based on submodularity, supported by empirical experiments.
Contribution
It offers the first finite-time bounds for the knowledge-gradient policy and introduces the concept of prior-optimality in Bayesian ranking problems.
Findings
Finite-time bounds for the knowledge-gradient policy derived.
Submodularity established for two-alternative cases.
Empirical results illustrate finite-time behavior of the policy.
Abstract
We consider sequential decision problems in which we adaptively choose one of finitely many alternatives and observe a stochastic reward. We offer a new perspective of interpreting Bayesian ranking and selection problems as adaptive stochastic multi-set maximization problems and derive the first finite-time bound of the knowledge-gradient policy for adaptive submodular objective functions. In addition, we introduce the concept of prior-optimality and provide another insight into the performance of the knowledge gradient policy based on the submodular assumption on the value of information. We demonstrate submodularity for the two-alternative case and provide other conditions for more general problems, bringing out the issue and importance of submodularity in learning problems. Empirical experiments are conducted to further illustrate the finite time behavior of the knowledge gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
