Distributed Online Learning via Cooperative Contextual Bandits
Cem Tekin, Mihaela van der Schaar

TL;DR
This paper introduces a decentralized framework for cooperative contextual bandits where multiple learners make online decisions, share information, and balance rewards and costs, with proven sublinear regret bounds.
Contribution
It proposes a novel cooperative decentralized online learning framework with algorithms and theoretical analysis for multi-learner contextual bandits.
Findings
Regret of the algorithms is proven to be sublinear in time.
The framework effectively balances reward maximization and cooperation costs.
Applicable to large-scale data mining and distributed recommendation systems.
Abstract
In this paper we propose a novel framework for decentralized, online learning by many learners. At each moment of time, an instance characterized by a certain context may arrive to each learner; based on the context, the learner can select one of its own actions (which gives a reward and provides information) or request assistance from another learner. In the latter case, the requester pays a cost and receives the reward but the provider learns the information. In our framework, learners are modeled as cooperative contextual bandits. Each learner seeks to maximize the expected reward from its arrivals, which involves trading off the reward received from its own actions, the information learned from its own actions, the reward received from the actions requested of others and the cost paid for these actions - taking into account what it has learned about the value of assistance from each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
