Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or   Bayesian?

S. V. Sai Santosh; Sumit J. Darak

arXiv:2106.02855·eess.SY·June 8, 2021

Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?

S. V. Sai Santosh, Sumit J. Darak

PDF

Open Access

TL;DR

This paper develops a reconfigurable, efficient system-on-chip implementation of multi-armed bandit algorithms, enabling adaptive selection between Bayesian and frequentist methods for edge devices with resource constraints.

Contribution

It introduces a reconfigurable framework that intelligently switches between MAB algorithms on SoC, approximates Thompson Sampling for hardware implementation, and analyzes resource efficiency.

Findings

01

RI-MAB outperforms fixed TS and UCB architectures.

02

Efficient approximation of Thompson Sampling enables hardware realization.

03

Significant resource and power savings achieved with reconfigurability.

Abstract

Multi-armed Bandit (MAB) algorithms identify the best arm among multiple arms via exploration-exploitation trade-off without prior knowledge of arm statistics. Their usefulness in wireless radio, IoT, and robotics demand deployment on edge devices, and hence, a mapping on system-on-chip (SoC) is desired. Theoretically, the Bayesian approach-based Thompson Sampling (TS) algorithm offers better performance than the frequentist approach-based Upper Confidence Bound (UCB) algorithm. However, TS is not synthesizable due to Beta function. We address this problem by approximating it via a pseudo-random number generator-based approach and efficiently realize the TS algorithm on Zynq SoC. In practice, the type of arms distribution (e.g., Bernoulli, Gaussian, etc.) is unknown and hence, a single algorithm may not be optimal. We propose a reconfigurable and intelligent MAB (RI-MAB) framework.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Optimization and Search Problems

MethodsSpatio-temporal stability analysis