Optimism and Delays in Episodic Reinforcement Learning

Benjamin Howson; Ciara Pike-Burke; Sarah Filippi

arXiv:2111.07615·cs.LG·April 7, 2023·1 cites

Optimism and Delays in Episodic Reinforcement Learning

Benjamin Howson, Ciara Pike-Burke, Sarah Filippi

PDF

Open Access

TL;DR

This paper examines how delays in feedback affect regret minimization in episodic reinforcement learning, proposing two approaches to handle delays and analyzing their theoretical and empirical impacts.

Contribution

It introduces two general strategies for managing delayed feedback in episodic reinforcement learning and provides theoretical regret bounds for both approaches.

Findings

01

Regret increases additively with delay, states, actions, and episode length.

02

Empirical results validate the theoretical regret bounds under various delay distributions.

03

Waiting or updating immediately are both viable strategies depending on delay characteristics.

Abstract

There are many algorithms for regret minimisation in episodic reinforcement learning. This problem is well-understood from a theoretical perspective, providing that the sequences of states, actions and rewards associated with each episode are available to the algorithm updating the policy immediately after every interaction with the environment. However, feedback is almost always delayed in practice. In this paper, we study the impact of delayed feedback in episodic reinforcement learning from a theoretical perspective and propose two general-purpose approaches to handling the delays. The first involves updating as soon as new information becomes available, whereas the second waits before using newly observed information to update the policy. For the class of optimistic algorithms and either approach, we show that the regret increases by an additive term involving the number of states,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Smart Grid Energy Management