Contextual Bandits for Resource-Constrained Devices using Probabilistic Learning
Marco Angioli, Kevin Johansson, Antonello Rosato, Amy Loutfi, Denis Kleyko

TL;DR
This paper introduces probabilistic HD-CB, a low-precision, resource-efficient contextual bandit algorithm suitable for deployment on constrained devices, improving scalability and decision quality.
Contribution
It proposes a probabilistic update rule for hyperdimensional CB, enabling low-precision operation without overflow and maintaining high decision accuracy.
Findings
Probabilistic HD-CB outperforms binarized HD-CB at equal precision.
It approaches the performance of full HD-CB with only 3 bits per component.
The method reduces update costs proportionally to the fraction of components updated.
Abstract
Contextual bandits (CB) are online sequential decision-making problems under partial feedback that underpin many adaptive services. There is a growing demand to deploy CB agents directly on-device, under strict constraints on memory, compute, and energy. However, standard linear CB algorithms are often impractical for resource-constrained devices with their unfavorable scaling in computational and memory costs. Recently, HD-CB, a CB approach based on hyperdimensional computing principles, has been proposed to model and solve CB problems by moving into high-dimensional spaces. HD-CB offers faster convergence, favorable scalability, and improves memory efficiency compared to linear CB algorithms. However, its learning rule is accumulation-based: the values of action vectors grow over time, requiring high precision. While periodic binarization can prevent overflow in low-precision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
