Adaptive Estimation and Optimal Control in Offline Contextual MDPs without Stationarity
Riddhiman Bhattacharyya, Sayak Chakrabarty, Imon Banerjee

TL;DR
This paper introduces a novel adaptive estimation method with strong optimality guarantees for offline contextual MDPs, addressing challenges like non-stationarity and model irregularity.
Contribution
It presents the first estimator with theoretical guarantees for offline contextual MDPs, utilizing $T$-estimation to handle non-stationarity and irregularity.
Findings
Established oracle risk bounds under two loss functions.
Provided finite sample guarantees for cost optimization.
Introduced a new adaptive estimator for offline contextual MDPs.
Abstract
Contextual MDPs are powerful tools with wide applicability in areas from biostatistics to machine learning. However, specializing them to offline datasets has been challenging due to a lack of robust, theoretically backed methods. Our work tackles this problem by introducing a new approach towards adaptive estimation and cost optimization of contextual MDPs. This estimator, to the best of our knowledge, is the first of its kind, and is endowed with strong optimality guarantees. We achieve this by overcoming the key technical challenges evolving from the endogenous properties of contextual MDPs; such as non-stationarity, or model irregularity. Our guarantees are established under complete generality by utilizing the relatively recent and powerful statistical technique of -estimation (Baraud, 2011). We first provide a procedure for selecting an estimator given a sample from a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
