Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation
Feichen Gan, Youcun Lu, Yingying Zhang, Yukun Liu

TL;DR
This paper introduces a distribution-free conformal prediction framework for reliable uncertainty quantification in infinite-horizon reinforcement learning, addressing challenges like temporal dependencies and distributional shifts.
Contribution
It presents a novel, modular conformal prediction method that integrates with distributional RL to provide valid prediction intervals under complex, high-stakes settings.
Findings
Improves coverage and reliability over standard baselines.
Provides theoretical guarantees under model misspecification.
Demonstrates effectiveness in synthetic and benchmark environments.
Abstract
Reliable uncertainty quantification is crucial for reinforcement learning (RL) in high-stakes settings. We propose a unified conformal prediction framework for infinite-horizon policy evaluation that constructs distribution-free prediction intervals {for returns} in both on-policy and off-policy settings. Our method integrates distributional RL with conformal calibration, addressing challenges such as unobserved returns, temporal dependencies, and distributional shifts. We propose a modular pseudo-return construction based on truncated rollouts and a time-aware calibration strategy using experience replay and weighted subsampling. These innovations mitigate model bias and restore approximate exchangeability, enabling uncertainty quantification even under policy shifts. Our theoretical analysis provides coverage guarantees that account for model misspecification and importance weight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
