Bandit algorithms to emulate human decision making using probabilistic distortions
Ravi Kumar Kolla, Prashanth L.A., Aditya Gopalan, Krishna Jagannathan,, Michael Fu, Steve Marcus

TL;DR
This paper introduces new bandit algorithms that incorporate probabilistic distortions to better emulate human decision-making, providing theoretical guarantees and demonstrating advantages in traffic routing scenarios.
Contribution
It develops distortion-aware bandit algorithms for regret minimization and best arm identification, with proven optimality and applicability to human-like decision models.
Findings
Algorithms achieve sublinear regret with distortion considerations.
Proven order-optimality of algorithms in both regret and identification.
Simulation shows improved performance in traffic routing applications.
Abstract
Motivated by models of human decision making proposed to explain commonly observed deviations from conventional expected value preferences, we formulate two stochastic multi-armed bandit problems with distorted probabilities on the reward distributions: the classic -armed bandit and the linearly parameterized bandit settings. We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits. For the regret minimization setting in -armed as well as linear bandit problems, we propose algorithms that are inspired by Upper Confidence Bound (UCB) algorithms, incorporate reward distortions, and exhibit sublinear regret. For the -armed bandit setting, we derive an upper bound on the expected regret for our proposed algorithm, and then we prove a matching lower bound to establish the order-optimality of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Adversarial Robustness in Machine Learning
