BAPR: Bayesian amnesic piecewise-robust reinforcement learning for non-stationary continuous control
Yifan Zhang, Liang Zheng

TL;DR
This paper introduces BAPR, a Bayesian reinforcement learning method that adaptively detects regime changes and combines robust policies with formal verification to improve control in non-stationary environments.
Contribution
It unifies Bayesian change detection with robust ensemble RL, provides formal verification of the operator's contraction properties, and introduces a mode-aware representation module.
Findings
The BAPR operator is a $eta$-contraction under certain conditions.
Formal error bounds are derived and machine-verified for the abstract operator.
The method achieves adaptive conservatism with provable detection delay bounds.
Abstract
Real-world control systems frequently operate under \emph{piecewise stationary} conditions, where dynamics remain stable for extended periods before undergoing abrupt regime changes. Standard robust RL methods face a fundamental dilemma: a globally conservative policy wastes performance during stable periods, while a locally adaptive policy risks catastrophic failure when the regime changes undetected. We propose \textbf{BAPR} (Bayesian Amnesic Piecewise-Robust SAC), which unifies Bayesian Online Change Detection (BOCD) with robust ensemble RL. The BAPR operator -- a convex combination of mode-conditional Bellman operators weighted by a frozen belief distribution -- is a -contraction. A complementary counterexample, machine-verified in Lean~4, establishes a \emph{sharp boundary}: when beliefs depend on the Q-function, the contraction factor becomes …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
