A Unifying View of Coverage in Linear Off-Policy Evaluation
Philip Amortila, Audrey Huang, Akshay Krishnamurthy, Nan Jiang

TL;DR
This paper introduces a unified framework for understanding coverage in linear off-policy evaluation in reinforcement learning, providing a new coverage parameter that generalizes previous notions and offers tighter finite-sample guarantees.
Contribution
It proposes a novel coverage parameter called feature-dynamics coverage, unifying various existing definitions and enabling a comprehensive analysis of linear OPE algorithms.
Findings
Introduces feature-dynamics coverage parameter.
Provides finite-sample error bounds based on this new coverage.
Recovers classical coverage notions under additional assumptions.
Abstract
Off-policy evaluation (OPE) is a fundamental task in reinforcement learning (RL). In the classic setting of linear OPE, finite-sample guarantees often take the form where is the dimension of the features and is a coverage parameter that characterizes the degree to which the visited features lie in the span of the data distribution. While such guarantees are well-understood for several popular algorithms under stronger assumptions (e.g. Bellman completeness), the understanding is lacking and fragmented in the minimal setting where only the target value function is linearly realizable in the features. Despite recent interest in tight characterizations of the statistical rate in this setting, the right notion of coverage remains unclear, and candidate definitions from prior analyses have…
Peer Reviews
Decision·ICLR 2026 Poster
- Use $Z=\phi(s,a)$ as an instrumental variables to solve the "error in variables", which is induced by $X=\phi(s,a)-\gamma\,\phi(s',a')$, yielding a finite-sample value bound. - The proposed feature-dynamics coverage resolves key deficiencies of prior metrics, by ensuring scale-invariance and meaningful characterization under general off-policy distributions. - The new definition of coverage via Proposition 1 is elegant, interpretable, and enables unification of various existing notio
1. The motivation for key constructions appears late, making the early sections harder to follow. 2. The paper could better distinguish the roles of Theorem 1 and Proposition 1 to clarify the main message.
1. The proofs are checked to be mathematically sound. 2. The perspective of analysis looks new to me. 3. Section 5 is appreciated since it delivers very clear messages on how to make sense of the newly defined parameter, as well as providing a good collection of equivalence results with existing parameters.
1. The so-called ``IV perspective'' that inspires the new results confuses me a bit. * As far as I'm concerned, in a linear model $Y = X^{\top} \theta + \epsilon$, IV is only necessary when $X$ and $\epsilon$ are not independent. Speaking of intuitions, I don't see why it should be the case here. * It is also a little confusing to refer to Eq. (7) as the linear regression problem, since linear regression shouldn't come with the $\mathbb{E}$, but rather, with observable individual data po
- The paper introduces the feature-dynamics coverage parameter $C_\phi^\pi$, providing a unified perspective on coverage in linear off-policy evaluation (OPE). Derived from an IV view of the LSTDQ algorithm, $C_\phi^\pi$ quantifies how well features induced by the behavior policy capture the subspace relevant to the target policy. It interprets coverage as occurring within a feature-compressed MDP, linking the environment’s dynamics with the feature representation and offering a scale-invariant,
1. The paper focuses on the linear function approximation setting, assuming $Q_\pi(s,a) = \phi(s,a)^\top \theta^\star$. This assumption enables a clean finite-sample analysis of the LSTDQ estimator and the introduction of the coverage parameter in Equation (13). However, the framework relies on the invertibility of $\Sigma$ and $A$ and applies only to the linear regime. Recent work in off-policy evaluation has advanced toward general function approximation via eluder dimension, where representat
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research
