PID Accelerated Temporal Difference Algorithms

Mark Bedaywi; Amin Rakhsha; Amir-massoud Farahmand

arXiv:2407.08803·cs.LG·September 4, 2024

PID Accelerated Temporal Difference Algorithms

Mark Bedaywi, Amin Rakhsha, Amir-massoud Farahmand

PDF

Open Access

TL;DR

This paper introduces PID-based acceleration techniques for Temporal Difference and Q-Learning algorithms in reinforcement learning, improving convergence speed in long-horizon tasks with noisy environments.

Contribution

It extends PID control concepts to TD and Q-Learning, providing theoretical convergence analysis and adaptive gain methods for noisy sampling environments.

Findings

01

PID TD Learning converges faster than conventional TD.

02

Adaptive PID gains improve robustness in noisy settings.

03

Empirical results confirm accelerated learning in long-horizon tasks.

Abstract

Long-horizon tasks, which have a large discount factor, pose a challenge for most conventional reinforcement learning (RL) algorithms. Algorithms such as Value Iteration and Temporal Difference (TD) learning have a slow convergence rate and become inefficient in these tasks. When the transition distributions are given, PID VI was recently introduced to accelerate the convergence of Value Iteration using ideas from control theory. Inspired by this, we introduce PID TD Learning and PID Q-Learning algorithms for the RL setting, in which only samples from the environment are available. We give a theoretical analysis of the convergence of PID TD Learning and its acceleration compared to the conventional TD Learning. We also introduce a method for adapting PID gains in the presence of noise and empirically verify its effectiveness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Design

MethodsQ-Learning