Semiparametric Off-Policy Inference for Optimal Policy Values under Possible Non-Uniqueness

Haoyu Wei

arXiv:2505.13809·math.ST·January 21, 2026

Semiparametric Off-Policy Inference for Optimal Policy Values under Possible Non-Uniqueness

Haoyu Wei

PDF

Open Access

TL;DR

This paper develops a semiparametric inference method for evaluating optimal policies in Markov decision processes, addressing challenges of non-uniqueness and non-regularity, with theoretical guarantees and practical applications.

Contribution

It introduces NSAVE, a novel semiparametric method achieving efficiency and robustness for off-policy evaluation of optimal policies, even under non-uniqueness.

Findings

01

NSAVE achieves semiparametric efficiency.

02

Method remains stable in degenerate regimes.

03

Application provides patient-specific confidence intervals.

Abstract

Off-policy evaluation (OPE) constructs confidence intervals for the value of a target policy using data generated under a different behavior policy. Most existing inference methods focus on fixed target policies and may fail when the target policy is estimated as optimal, particularly when the optimal policy is non-unique or nearly deterministic. We study inference for the value of optimal policies in Markov decision processes. We characterize the existence of the efficient influence function and show that non-regularity arises under policy non-uniqueness. Motivated by this analysis, we propose a novel \textit{N}onparametric \textit{S}equenti\textit{A}l \textit{V}alue \textit{E}valuation (NSAVE) method, which achieves semiparametric efficiency and retains the double robustness property when the optimal policy is unique, and remains stable in degenerate regimes beyond the scope of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSupply Chain and Inventory Management