Safe Policy Search for Lifelong Reinforcement Learning with Sublinear   Regret

Haitham Bou Ammar; Rasul Tutunov; Eric Eaton

arXiv:1505.05798·cs.LG·May 22, 2015·21 cites

Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret

Haitham Bou Ammar, Rasul Tutunov, Eric Eaton

PDF

Open Access

TL;DR

This paper introduces a safe lifelong reinforcement learning algorithm that achieves sublinear regret, enabling agents to learn multiple tasks efficiently and safely over time, demonstrated on dynamical systems including quadrotor control.

Contribution

The paper proposes the first lifelong policy gradient method with sublinear regret that enforces safety constraints during online multi-task learning.

Findings

01

Achieves sublinear regret in lifelong policy search.

02

Validates safety and efficiency on benchmark dynamical systems.

03

Demonstrates effectiveness in quadrotor control applications.

Abstract

Lifelong reinforcement learning provides a promising framework for developing versatile agents that can accumulate knowledge over a lifetime of experience and rapidly learn new tasks by building upon prior knowledge. However, current lifelong learning methods exhibit non-vanishing regret as the amount of experience increases and include limitations that can lead to suboptimal or unsafe control policies. To address these issues, we develop a lifelong policy gradient learner that operates in an adversarial set- ting to learn multiple tasks online while enforcing safety constraints on the learned policies. We demonstrate, for the first time, sublinear regret for lifelong policy search, and validate our algorithm on several benchmark dynamical systems and an application to quadrotor control.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adaptive Dynamic Programming Control