PID Accelerated Temporal Difference Algorithms
Mark Bedaywi, Amin Rakhsha, Amir-massoud Farahmand

TL;DR
This paper introduces PID-based acceleration techniques for Temporal Difference and Q-Learning algorithms in reinforcement learning, improving convergence speed in long-horizon tasks with noisy environments.
Contribution
It extends PID control concepts to TD and Q-Learning, providing theoretical convergence analysis and adaptive gain methods for noisy sampling environments.
Findings
PID TD Learning converges faster than conventional TD.
Adaptive PID gains improve robustness in noisy settings.
Empirical results confirm accelerated learning in long-horizon tasks.
Abstract
Long-horizon tasks, which have a large discount factor, pose a challenge for most conventional reinforcement learning (RL) algorithms. Algorithms such as Value Iteration and Temporal Difference (TD) learning have a slow convergence rate and become inefficient in these tasks. When the transition distributions are given, PID VI was recently introduced to accelerate the convergence of Value Iteration using ideas from control theory. Inspired by this, we introduce PID TD Learning and PID Q-Learning algorithms for the RL setting, in which only samples from the environment are available. We give a theoretical analysis of the convergence of PID TD Learning and its acceleration compared to the conventional TD Learning. We also introduce a method for adapting PID gains in the presence of noise and empirically verify its effectiveness.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Design
MethodsQ-Learning
