Adaptive Estimation and Optimal Control in Offline Contextual MDPs without Stationarity

Riddhiman Bhattacharyya; Sayak Chakrabarty; Imon Banerjee

arXiv:2605.03393·stat.ML·May 6, 2026

Adaptive Estimation and Optimal Control in Offline Contextual MDPs without Stationarity

Riddhiman Bhattacharyya, Sayak Chakrabarty, Imon Banerjee

PDF

TL;DR

This paper introduces a novel adaptive estimation method with strong optimality guarantees for offline contextual MDPs, addressing challenges like non-stationarity and model irregularity.

Contribution

It presents the first estimator with theoretical guarantees for offline contextual MDPs, utilizing $T$-estimation to handle non-stationarity and irregularity.

Findings

01

Established oracle risk bounds under two loss functions.

02

Provided finite sample guarantees for cost optimization.

03

Introduced a new adaptive estimator for offline contextual MDPs.

Abstract

Contextual MDPs are powerful tools with wide applicability in areas from biostatistics to machine learning. However, specializing them to offline datasets has been challenging due to a lack of robust, theoretically backed methods. Our work tackles this problem by introducing a new approach towards adaptive estimation and cost optimization of contextual MDPs. This estimator, to the best of our knowledge, is the first of its kind, and is endowed with strong optimality guarantees. We achieve this by overcoming the key technical challenges evolving from the endogenous properties of contextual MDPs; such as non-stationarity, or model irregularity. Our guarantees are established under complete generality by utilizing the relatively recent and powerful statistical technique of $T$ -estimation (Baraud, 2011). We first provide a procedure for selecting an estimator given a sample from a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.