Gaussian One-Armed Bandit and Optimization of Batch Data Processing
Alexander Kolnogorov

TL;DR
This paper analyzes the minimax strategy for Gaussian one-armed bandits in batch data processing, deriving equations to compute optimal strategies and showing batch processing's near-optimality in large-sample scenarios.
Contribution
It introduces a recursive integro-difference and PDE framework for minimax strategies in Gaussian bandits with batch data, connecting game theory with numerical solutions.
Findings
Minimax risk can be approximated by PDE solutions for large batch numbers.
Batch data processing nearly matches optimal one-by-one processing in large-sample cases.
Theoretical equations enable numerical computation of strategies and risks.
Abstract
We consider the minimax setup for Gaussian one-armed bandit problem, i.e. the two-armed bandit problem with Gaussian distributions of incomes and known distribution corresponding to the first arm. This setup naturally arises when the optimization of batch data processing is considered and there are two alternative processing methods available with a priori known efficiency of the first method. One should estimate the efficiency of the second method and provide predominant usage of the most efficient of both them. According to the main theorem of the theory of games minimax strategy and minimax risk are searched for as Bayesian ones corresponding to the worst-case prior distribution. As a result, we obtain the recursive integro-difference equation and the second order partial differential equation in the limiting case as the number of batches goes to infinity. This makes it possible to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Reinforcement Learning in Robotics
