BAPR: Bayesian amnesic piecewise-robust reinforcement learning for non-stationary continuous control

Yifan Zhang; Liang Zheng

arXiv:2605.16170·cs.LG·May 20, 2026

BAPR: Bayesian amnesic piecewise-robust reinforcement learning for non-stationary continuous control

Yifan Zhang, Liang Zheng

PDF

TL;DR

This paper introduces BAPR, a Bayesian reinforcement learning method that adaptively detects regime changes and combines robust policies with formal verification to improve control in non-stationary environments.

Contribution

It unifies Bayesian change detection with robust ensemble RL, provides formal verification of the operator's contraction properties, and introduces a mode-aware representation module.

Findings

01

The BAPR operator is a $eta$-contraction under certain conditions.

02

Formal error bounds are derived and machine-verified for the abstract operator.

03

The method achieves adaptive conservatism with provable detection delay bounds.

Abstract

Real-world control systems frequently operate under \emph{piecewise stationary} conditions, where dynamics remain stable for extended periods before undergoing abrupt regime changes. Standard robust RL methods face a fundamental dilemma: a globally conservative policy wastes performance during stable periods, while a locally adaptive policy risks catastrophic failure when the regime changes undetected. We propose \textbf{BAPR} (Bayesian Amnesic Piecewise-Robust SAC), which unifies Bayesian Online Change Detection (BOCD) with robust ensemble RL. The BAPR operator -- a convex combination of mode-conditional Bellman operators weighted by a frozen belief distribution -- is a $γ$ -contraction. A complementary counterexample, machine-verified in Lean~4, establishes a \emph{sharp boundary}: when beliefs depend on the Q-function, the contraction factor becomes $γ + λ Δ$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.