Understanding the stochastic dynamics of sequential decision-making processes: A path-integral analysis of multi-armed bandits
Bo Li, Chi Ho Yeung

TL;DR
This paper applies statistical physics techniques to analyze the stochastic dynamics of multi-armed bandit models, revealing finite-time regret distributions and complex behaviors like multimodality and regret due to early exploitation errors.
Contribution
It introduces a novel analytical approach using path-integral methods to characterize finite-time regret distributions in MAB models, bridging decision-making and physics.
Findings
Finite-time regret distribution is multimodal.
Large regrets often stem from early sub-optimal exploitation.
Analytical results align well with simulations.
Abstract
The multi-armed bandit (MAB) model is one of the most classical models to study decision-making in an uncertain environment. In this model, a player chooses one of possible arms of a bandit machine to play at each time step, where the corresponding arm returns a random reward to the player, potentially from a specific unknown distribution. The target of the player is to collect as many rewards as possible during the process. Despite its simplicity, the MAB model offers an excellent playground for studying the trade-off between exploration versus exploitation and designing effective algorithms for sequential decision-making under uncertainty. Although many asymptotically optimal algorithms have been established, the finite-time behaviors of the stochastic dynamics of the MAB model appear much more challenging to analyze, due to the intertwine between the decision-making and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Gaussian Processes and Bayesian Inference
