Delegative Reinforcement Learning: learning to avoid traps with a little help
Vanessa Kosoy

TL;DR
This paper introduces Delegative Reinforcement Learning (DRL), a new framework where algorithms can delegate actions to an external advisor to avoid traps, achieving regret bounds without traditional assumptions.
Contribution
It presents the first regret analysis for a model-based RL setting allowing action delegation, extending RL theory to trap-avoidance scenarios.
Findings
Derived a regret bound for DRL without episodic or trap-free assumptions.
Developed a variant of Posterior Sampling Reinforcement Learning with delegation.
Limited analysis to finite MDPs with a fixed number of hypotheses, states, and actions.
Abstract
Most known regret bounds for reinforcement learning are either episodic or assume an environment without traps. We derive a regret bound without making either assumption, by allowing the algorithm to occasionally delegate an action to an external advisor. We thus arrive at a setting of active one-shot model-based reinforcement learning that we call DRL (delegative reinforcement learning.) The algorithm we construct in order to demonstrate the regret bound is a variant of Posterior Sampling Reinforcement Learning supplemented by a subroutine that decides which actions should be delegated. The algorithm is not anytime, since the parameters must be adjusted according to the target time discount. Currently, our analysis is limited to Markov decision processes with finite numbers of hypotheses, states and actions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Auction Theory and Applications
