Geometry of Drifting MDPs with Path-Integral Stability Certificates

Zuyuan Zhang; Mahdi Imani; Tian Lan

arXiv:2601.21991·cs.LG·January 30, 2026

Geometry of Drifting MDPs with Path-Integral Stability Certificates

Zuyuan Zhang, Mahdi Imani, Tian Lan

PDF

Open Access

TL;DR

This paper introduces a geometric framework for analyzing nonstationary reinforcement learning environments by modeling the environment as a differentiable path, leading to stability certificates and adaptive algorithms that improve tracking in dynamic settings.

Contribution

It presents a novel geometric perspective on nonstationary MDPs, deriving stability bounds and feasible regions, and develops HT-RL and HT-MCTS algorithms that adaptively respond to environment changes.

Findings

01

Improved tracking and regret in oscillatory regimes

02

Effective online estimation of environment complexity

03

Enhanced stability guarantees for nonstationary RL

Abstract

Real-world reinforcement learning is often \emph{nonstationary}: rewards and dynamics drift, accelerate, oscillate, and trigger abrupt switches in the optimal action. Existing theory often represents nonstationarity with coarse-scale models that measure \emph{how much} the environment changes, not \emph{how} it changes locally -- even though acceleration and near-ties drive tracking error and policy chattering. We take a geometric view of nonstationary discounted Markov Decision Processes (MDPs) by modeling the environment as a differentiable homotopy path and tracking the induced motion of the optimal Bellman fixed point. This yields a length-curvature-kink signature of intrinsic complexity: cumulative drift, acceleration/oscillation, and action-gap-induced nonsmoothness. We prove a solver-agnostic path-integral stability bound and derive gap-safe feasible regions that certify local…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Autonomous Vehicle Technology and Safety