Efficient Contextual Bandits with Continuous Actions
Maryam Majzoubi, Chicheng Zhang, Rajan Chari, Akshay Krishnamurthy,, John Langford, Aleksandrs Slivkins

TL;DR
This paper introduces a computationally efficient algorithm for contextual bandits with continuous actions, capable of handling unknown structures and compatible with various supervised learning models, validated through theoretical analysis and large-scale experiments.
Contribution
It presents a novel reduction-style algorithm for continuous action contextual bandits that is both computationally tractable and broadly applicable.
Findings
Proves the algorithm's effectiveness in a general setting
Demonstrates scalability with large-scale experiments
Shows compatibility with most supervised learning representations
Abstract
We create a computationally tractable algorithm for contextual bandits with continuous actions having unknown structure. Our reduction-style algorithm composes with most supervised learning representations. We prove that it works in a general sense and verify the new functionality with large-scale experiments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
