Oracle-Efficient Reinforcement Learning for Max Value Ensembles

Marcel Hussing; Michael Kearns; Aaron Roth; Sikata Bela Sengupta,; Jessica Sorrell

arXiv:2405.16739·cs.LG·May 28, 2024

Oracle-Efficient Reinforcement Learning for Max Value Ensembles

Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta,, Jessica Sorrell

PDF

Open Access 1 Video

TL;DR

This paper introduces an efficient reinforcement learning algorithm that leverages a collection of heuristic policies to compete with the max-following policy, achieving scalable learning without requiring access to value functions.

Contribution

The work presents a novel algorithm that learns to compete with the max-following policy using only constituent policies and an ERM oracle, without needing the global optimal policy.

Findings

01

Algorithm effectively competes with the max-following policy

02

Theoretical guarantees rely on minimal assumptions

03

Demonstrated success in robotic simulation environments

Abstract

Reinforcement learning (RL) in large or infinite state spaces is notoriously challenging, both theoretically (where worst-case sample and computational complexities must scale with state space cardinality) and experimentally (where function approximation and policy gradient techniques often scale poorly and suffer from instability and high variance). One line of research attempting to address these difficulties makes the natural assumption that we are given a collection of heuristic base or $constituent$ policies upon which we would like to improve in a scalable manner. In this work we aim to compete with the $max-following policy$ , which at each state follows the action of whichever constituent policy has the highest value. The max-following policy is always at least as good as the best constituent policy, and may be considerably better. Our main result is an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Oracle-Efficient Reinforcement Learning for Max Value Ensembles· slideslive

Taxonomy

TopicsBlockchain Technology Applications and Security · Traffic control and management · Reinforcement Learning in Robotics

MethodsBalanced Selection