# Reinforcement Operator Learning (ROL): A hybrid DeepONet-guided reinforcement learning framework for stabilizing the Kuramoto–Sivashinsky equation

**Authors:** Nadim Ahmed, Md. Ashraful Babu, Muhammad Sajjad Hossain, Md. Fayz-Al- Asad, Md. Awlad Hossain, Md. Mortuza Ahmmed, M. Mostafizur Rahman, Mufti Mahmud, Angelo Marcelo Tusset, Angelo Marcelo Tusset, Angelo Marcelo Tusset

PMC · DOI: 10.1371/journal.pone.0341161 · PLOS One · 2026-01-30

## TL;DR

This paper introduces a new hybrid AI framework that combines deep learning and reinforcement learning to stabilize chaotic systems, achieving significant improvements over existing methods.

## Contribution

The novel ROL framework merges DeepONet with TD3 for efficient stabilization of chaotic PDEs.

## Key findings

- ROL reduces system energy by 99.1% compared to LQR and 64.3% compared to pure TD3.
- DeepONet enables faster RL training with 65% lower variance and 2.5× quicker reward plateau.
- ROL tightens state amplitudes three-fold compared to TD3 and halves energy 33% faster.

## Abstract

This study presents Reinforcement Operator Learning (ROL)—a hybrid control paradigm that marries Deep Operator Networks (DeepONet) for offline acquisition of a generalized control law with a Twin-Delayed Deep Deterministic Policy Gradient (TD3) residual for online adaptation. The framework is assessed on the one-dimensional Kuramoto–Sivashinsky equation, a benchmark for spatio-temporal chaos. Starting from an uncontrolled energy of 42.8, ROL drives the system to a steady-state energy of 0.40  ± 0.14, achieving a 99.1% reduction relative to a linear–quadratic regulator (LQR) and a 64.3% reduction compared with a pure TD3 agent. DeepONet attains a training loss of 7.8 × 10−6 after only 200 epochs, enabling the RL phase to reach its reward plateau 2.5 × sooner and with 65% lower variance than the baseline. Spatio-temporal analysis confirms that ROL restricts state amplitudes to ±1.8—three-fold tighter than pure TD3 and an order of magnitude below LQR—while halving the energy in 0.19 simulation units (33% faster than pure TD3). These results demonstrate that combining operator learning with residual policy optimisation delivers state-of-the-art, sample-efficient stabilisation of chaotic partial differential equations and offers a scalable template for turbulence suppression, combustion control, and other high-dimensional nonlinear systems.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12858074/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12858074/full.md

## References

50 references — full list in the complete paper: https://tomesphere.com/paper/PMC12858074/full.md

---
Source: https://tomesphere.com/paper/PMC12858074