Reinforcement Learning with Trajectory Feedback

Yonathan Efroni; Nadav Merlis; Shie Mannor

arXiv:2008.06036·cs.LG·March 8, 2021

Reinforcement Learning with Trajectory Feedback

Yonathan Efroni, Nadav Merlis, Shie Mannor

PDF

1 Video

TL;DR

This paper introduces a new reinforcement learning setting where feedback is limited to trajectory scores rather than individual rewards, and develops algorithms with regret analysis for this weaker feedback model.

Contribution

It extends RL algorithms to the trajectory feedback setting, including unknown transition models, and provides regret bounds for these algorithms.

Findings

01

Algorithms achieve sublinear regret under trajectory feedback.

02

Hybrid optimistic-Thompson Sampling approach is tractable for unknown transitions.

03

The work broadens RL applicability to settings with limited feedback.

Abstract

The standard feedback model of reinforcement learning requires revealing the reward of every visited state-action pair. However, in practice, it is often the case that such frequent feedback is not available. In this work, we take a first step towards relaxing this assumption and require a weaker form of feedback, which we refer to as \emph{trajectory feedback}. Instead of observing the reward obtained after every action, we assume we only receive a score that represents the quality of the whole trajectory observed by the agent, namely, the sum of all rewards obtained over this trajectory. We extend reinforcement learning algorithms to this setting, based on least-squares estimation of the unknown reward, for both the known and unknown transition model cases, and study the performance of these algorithms by analyzing their regret. For cases where the transition model is unknown, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Reinforcement Learning with Trajectory Feedback· underline