Note on Thompson sampling for large decision problems
Tao Hu, Eric B. Laber, Zhen Li, Nick J. Meyer, Krishna Pacifici

TL;DR
This paper introduces a scalable variant of Thompson sampling for large decision problems, demonstrating its consistency, convergence rates, and practical effectiveness through simulations in epidemiology and wildlife management.
Contribution
It proposes a simple, implementable Thompson sampling method suitable for complex, large-scale decision problems, with theoretical guarantees and real-world applications.
Findings
The estimator is consistent for the optimal decision system.
Provides finite sample error bounds and convergence rates.
Effective in simulations of influenza spread and mallard population management.
Abstract
There is increasing interest in using streaming data to inform decision making across a wide range of application domains including mobile health, food safety, security, and resource management. A decision support system formalizes online decision making as a map from up-to-date information to a recommended decision. Online estimation of an optimal decision strategy from streaming data requires simultaneous estimation of components of the underlying system dynamics as well as the optimal decision strategy given these dynamics; thus, there is an inherent trade-off between choosing decisions that lead to improved estimates and choosing decisions that appear to be optimal based on current estimates. Thompson (1933) was among the first to formalize this trade-off in the context of choosing between two treatments for a stream of patients; he proposed a simple heuristic wherein a treatment is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Mobile Crowdsensing and Crowdsourcing
