Safe Deployment of Offline Reinforcement Learning via Input Convex Action Correction
Alex Durkin, Jasper Stolte, Matthew Jones, Raghuraman Pitchumani, Bei Li, Christian Michler, Mehmet Mercang\"oz

TL;DR
This paper presents a safe offline reinforcement learning framework for chemical process control, introducing a convex neural network-based safety layer that improves stability and performance in real-time applications.
Contribution
It introduces a novel input convex neural network safety layer for offline RL, enabling real-time action correction without environment interaction.
Findings
Offline RL with convex action correction outperforms traditional control methods.
The safety layer maintains stability across various process scenarios.
The approach is feasible for high-stakes chemical process control.
Abstract
Offline reinforcement learning (offline RL) offers a promising framework for developing control strategies in chemical process systems using historical data, without the risks or costs of online experimentation. This work investigates the application of offline RL to the safe and efficient control of an exothermic polymerisation continuous stirred-tank reactor. We introduce a Gymnasium-compatible simulation environment that captures the reactor's nonlinear dynamics, including reaction kinetics, energy balances, and operational constraints. The environment supports three industrially relevant scenarios: startup, grade change down, and grade change up. It also includes reproducible offline datasets generated from proportional-integral controllers with randomised tunings, providing a benchmark for evaluating offline RL algorithms in realistic process control tasks. We assess behaviour…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
