A Novel Entropy-Maximizing TD3-based Reinforcement Learning for Automatic PID Tuning
Myisha A. Chowdhury, Qiugang Lu

TL;DR
This paper introduces an entropy-maximizing TD3 reinforcement learning method to improve automatic PID tuning, enhancing sample efficiency and global optimality in complex systems.
Contribution
It proposes a novel EMTD3 algorithm combining stochastic exploration and deterministic exploitation for better PID parameter tuning.
Findings
Improved sample efficiency over traditional methods
Faster convergence to optimal PID parameters
Effective in tuning second-order systems
Abstract
Proportional-integral-derivative (PID) controllers have been widely used in the process industry. However, the satisfactory control performance of a PID controller depends strongly on the tuning parameters. Conventional PID tuning methods require extensive knowledge of the system model, which is not always known especially in the case of complex dynamical systems. In contrast, reinforcement learning-based PID tuning has gained popularity since it can treat PID tuning as a black-box problem and deliver the optimal PID parameters without requiring explicit process models. In this paper, we present a novel entropy-maximizing twin-delayed deep deterministic policy gradient (EMTD3) method for automating the PID tuning. In the proposed method, an entropy-maximizing stochastic actor is employed at the beginning to encourage the exploration of the action space. Then a deterministic actor is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExtremum Seeking Control Systems · Advanced Control Systems Optimization · Viral Infectious Diseases and Gene Expression in Insects
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Experience Replay · Clipped Double Q-learning · Target Policy Smoothing · Adam · Dense Connections · Twin Delayed Deep Deterministic
