Branching Reinforcement Learning

Yihan Du; Wei Chen

arXiv:2202.07995·cs.LG·June 16, 2022

Branching Reinforcement Learning

Yihan Du, Wei Chen

PDF

Open Access

TL;DR

This paper introduces a novel Branching Reinforcement Learning model that generates tree-structured trajectories, providing new algorithms and theoretical bounds for regret minimization and reward-free exploration in complex decision processes.

Contribution

The paper proposes a new branching RL framework, establishes fundamental Bellman equations and lemmas, and develops efficient algorithms with tight bounds for RM and RFE metrics.

Findings

01

Bound the total variance by O(H^2) despite exponential trajectories

02

Developed polynomial-time algorithms for regret minimization and exploration

03

Established nearly tight upper and lower bounds for the proposed methods

Abstract

In this paper, we propose a novel Branching Reinforcement Learning (Branching RL) model, and investigate both Regret Minimization (RM) and Reward-Free Exploration (RFE) metrics for this model. Unlike standard RL where the trajectory of each episode is a single $H$ -step path, branching RL allows an agent to take multiple base actions in a state such that transitions branch out to multiple successor states correspondingly, and thus it generates a tree-structured trajectory. This model finds important applications in hierarchical recommendation systems and online advertising. For branching RL, we establish new Bellman equations and key lemmas, i.e., branching value difference lemma and branching law of total variance, and also bound the total variance by only $O (H^{2})$ under an exponentially-large trajectory. For RM and RFE metrics, we propose computationally efficient algorithms BranchVI…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Auction Theory and Applications

MethodsBalanced Selection