Reinforcement Learning with a Terminator

Guy Tennenholtz; Nadav Merlis; Lior Shani; Shie Mannor; Uri Shalit,; Gal Chechik; Assaf Hallak; and Gal Dalal

arXiv:2205.15376·cs.LG·October 9, 2023

Reinforcement Learning with a Terminator

Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit,, Gal Chechik, Assaf Hallak, and Gal Dalal

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces the Termination Markov Decision Process (TerMDP) to model reinforcement learning scenarios with external interruptions, providing theoretical guarantees and practical algorithms for such settings, demonstrated on driving benchmarks and human data.

Contribution

The paper formulates TerMDP, develops confidence bounds and a provably-efficient algorithm for RL with termination, and implements a scalable method tested on high-dimensional benchmarks and human data.

Findings

01

Fast convergence of the proposed method.

02

Significant improvement over baseline approaches.

03

Effective handling of external termination in RL environments.

Abstract

We present the problem of reinforcement learning with exogenous termination. We define the Termination Markov Decision Process (TerMDP), an extension of the MDP framework, in which episodes may be interrupted by an external non-Markovian observer. This formulation accounts for numerous real-world situations, such as a human interrupting an autonomous driving agent for reasons of discomfort. We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds. We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret. Motivated by our theoretical analysis, we design and implement a scalable approach, which combines optimism (w.r.t. termination) and a dynamic discount factor, incorporating the termination probability. We deploy our method on high-dimensional driving and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

guytenn/terminator
noneOfficial

Videos

Reinforcement Learning with a Terminator· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning