Oracle-Efficient Reinforcement Learning for Max Value Ensembles
Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta,, Jessica Sorrell

TL;DR
This paper introduces an efficient reinforcement learning algorithm that leverages a collection of heuristic policies to compete with the max-following policy, achieving scalable learning without requiring access to value functions.
Contribution
The work presents a novel algorithm that learns to compete with the max-following policy using only constituent policies and an ERM oracle, without needing the global optimal policy.
Findings
Algorithm effectively competes with the max-following policy
Theoretical guarantees rely on minimal assumptions
Demonstrated success in robotic simulation environments
Abstract
Reinforcement learning (RL) in large or infinite state spaces is notoriously challenging, both theoretically (where worst-case sample and computational complexities must scale with state space cardinality) and experimentally (where function approximation and policy gradient techniques often scale poorly and suffer from instability and high variance). One line of research attempting to address these difficulties makes the natural assumption that we are given a collection of heuristic base or policies upon which we would like to improve in a scalable manner. In this work we aim to compete with the , which at each state follows the action of whichever constituent policy has the highest value. The max-following policy is always at least as good as the best constituent policy, and may be considerably better. Our main result is an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsBlockchain Technology Applications and Security · Traffic control and management · Reinforcement Learning in Robotics
MethodsBalanced Selection
