Selecting Computations: Theory and Applications
Nicholas Hay, Stuart Russell, David Tolpin, Solomon Eyal Shimony

TL;DR
This paper develops a Bayesian framework for metalevel decision-making in sequential problems, providing theoretical bounds and heuristics that outperform bandit-based methods in game and decision tasks.
Contribution
It introduces a Bayesian approach to metalevel decisions, deriving finite-sample bounds and heuristics that improve over bandit algorithms in Monte Carlo selection problems.
Findings
Finite sampling bounds for optimal policies in certain cases
Heuristic methods outperform bandit-based heuristics in experiments
Counterexample shows optimal policies may not always reach a decision
Abstract
Sequential decision problems are often approximately solvable by simulating possible future action sequences. {\em Metalevel} decision procedures have been developed for selecting {\em which} action sequences to simulate, based on estimating the expected improvement in decision quality that would result from any particular simulation; an example is the recent work on using bandit algorithms to control Monte Carlo tree search in the game of Go. In this paper we develop a theoretical basis for metalevel decisions in the statistical framework of Bayesian {\em selection problems}, arguing (as others have done) that this is more appropriate than the bandit framework. We derive a number of basic results applicable to Monte Carlo selection problems, including the first finite sampling bounds for optimal policies in certain cases; we also provide a simple counterexample to the intuitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSports Analytics and Performance · Advanced Bandit Algorithms Research · Artificial Intelligence in Games
