Simultaneously Updating All Persistence Values in Reinforcement Learning

Luca Sabbioni; Luca Al Daire; Lorenzo Bisi; Alberto Maria Metelli and; Marcello Restelli

arXiv:2211.11620·cs.LG·November 22, 2022·1 cites

Simultaneously Updating All Persistence Values in Reinforcement Learning

Luca Sabbioni, Luca Al Daire, Lorenzo Bisi, Alberto Maria Metelli and, Marcello Restelli

PDF

Open Access 1 Video

TL;DR

This paper introduces the All-Persistence Bellman Operator, enabling simultaneous updates of multiple persistence values in reinforcement learning, improving efficiency and exploration across different time scales.

Contribution

It proposes a novel operator for reinforcement learning that updates all persistence values at once, extending Q-learning and DQN with proven contraction properties.

Findings

01

Effective use of transitions at any time scale improves learning.

02

The approach enhances exploration and performance in Atari games.

03

The operator maintains contraction, ensuring convergence.

Abstract

In reinforcement learning, the performance of learning agents is highly sensitive to the choice of time discretization. Agents acting at high frequencies have the best control opportunities, along with some drawbacks, such as possible inefficient exploration and vanishing of the action advantages. The repetition of the actions, i.e., action persistence, comes into help, as it allows the agent to visit wider regions of the state space and improve the estimation of the action effects. In this work, we derive a novel All-Persistence Bellman Operator, which allows an effective use of both the low-persistence experience, by decomposition into sub-transition, and the high-persistence experience, thanks to the introduction of a suitable bootstrap procedure. In this way, we employ transitions collected at any time scale to update simultaneously the action values of the considered persistence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Simultaneously Updating All Persistence Values in Reinforcement Learning· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Neural Networks and Reservoir Computing · Data Stream Mining Techniques

MethodsConvolution · Dense Connections · Deep Q-Network · Q-Learning