Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?
S. V. Sai Santosh, Sumit J. Darak

TL;DR
This paper develops a reconfigurable, efficient system-on-chip implementation of multi-armed bandit algorithms, enabling adaptive selection between Bayesian and frequentist methods for edge devices with resource constraints.
Contribution
It introduces a reconfigurable framework that intelligently switches between MAB algorithms on SoC, approximates Thompson Sampling for hardware implementation, and analyzes resource efficiency.
Findings
RI-MAB outperforms fixed TS and UCB architectures.
Efficient approximation of Thompson Sampling enables hardware realization.
Significant resource and power savings achieved with reconfigurability.
Abstract
Multi-armed Bandit (MAB) algorithms identify the best arm among multiple arms via exploration-exploitation trade-off without prior knowledge of arm statistics. Their usefulness in wireless radio, IoT, and robotics demand deployment on edge devices, and hence, a mapping on system-on-chip (SoC) is desired. Theoretically, the Bayesian approach-based Thompson Sampling (TS) algorithm offers better performance than the frequentist approach-based Upper Confidence Bound (UCB) algorithm. However, TS is not synthesizable due to Beta function. We address this problem by approximating it via a pseudo-random number generator-based approach and efficiently realize the TS algorithm on Zynq SoC. In practice, the type of arms distribution (e.g., Bernoulli, Gaussian, etc.) is unknown and hence, a single algorithm may not be optimal. We propose a reconfigurable and intelligent MAB (RI-MAB) framework.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Optimization and Search Problems
MethodsSpatio-temporal stability analysis
