Twin actor twin delayed deep deterministic policy gradient (TATD3)   learning for batch process control

Tanuja Joshi; Shikhar Makker; Hariprasad Kodamana; Harikumar Kandath

arXiv:2102.13012·eess.SY·September 20, 2021

Twin actor twin delayed deep deterministic policy gradient (TATD3) learning for batch process control

Tanuja Joshi, Shikhar Makker, Hariprasad Kodamana, Harikumar Kandath

PDF

TL;DR

This paper introduces TATD3, a novel reinforcement learning algorithm with twin actor networks and new reward functions, designed to improve control in complex, nonlinear batch processes.

Contribution

It develops the TATD3 algorithm by integrating twin actor networks into TD3 and proposes two novel reward functions for enhanced batch process control.

Findings

01

TATD3 outperforms existing RL algorithms in batch process control tasks.

02

The proposed reward functions improve learning efficiency and control accuracy.

03

TATD3 demonstrates robustness across various process examples.

Abstract

Control of batch processes is a difficult task due to their complex nonlinear dynamics and unsteady-state operating conditions within batch and batch-to-batch. It is expected that some of these challenges can be addressed by developing control strategies that directly interact with the process and learning from experiences. Recent studies in the literature have indicated the advantage of having an ensemble of actors in actor-critic Reinforcement Learning (RL) frameworks for improving the policy. The present study proposes an actor-critic RL algorithm, namely, twin actor twin delayed deep deterministic policy gradient (TATD3), by incorporating twin actor networks in the existing twin-delayed deep deterministic policy gradient (TD3) algorithm for the continuous control. In addition, two types of novel reward functions are also proposed for TATD3 controller. We showcase the efficacy of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAdam · Clipped Double Q-learning · Target Policy Smoothing · Experience Replay · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Twin Delayed Deep Deterministic