Arm order recognition in multi-armed bandit problem with laser chaos time series
Naoki Narisawa, Nicolas Chauvet, Mikio Hasegawa, Makoto Naruse

TL;DR
This paper introduces an adaptive algorithm using laser chaos time series for multi-armed bandit problems, improving arm order recognition accuracy while maintaining high total rewards, applicable in resource allocation sectors.
Contribution
It presents a novel adaptive exploration control method based on confidence intervals, enhancing arm order recognition in laser chaos-based MAB algorithms.
Findings
Improved arm order recognition accuracy.
Reduced dependence on reward environment variations.
Maintained total reward levels.
Abstract
By exploiting ultrafast and irregular time series generated by lasers with delayed feedback, we have previously demonstrated a scalable algorithm to solve multi-armed bandit (MAB) problems utilizing the time-division multiplexing of laser chaos time series. Although the algorithm detects the arm with the highest reward expectation, the correct recognition of the order of arms in terms of reward expectations is not achievable. Here, we present an algorithm where the degree of exploration is adaptively controlled based on confidence intervals that represent the estimation accuracy of reward expectations. We have demonstrated numerically that our approach did improve arm order recognition accuracy significantly, along with reduced dependence on reward environments, and the total reward is almost maintained compared with conventional MAB methods. This study applies to sectors where the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReceptor Mechanisms and Signaling · Neural Networks and Reservoir Computing · Advanced Bandit Algorithms Research
