Offline Policy Comparison under Limited Historical Agent-Environment Interactions
Anton Dereventsov, Joseph D. Daws Jr., Clayton Webster

TL;DR
This paper proposes a new method called Limited Data Estimator (LDE) for reliably comparing policies in reinforcement learning when only limited historical data is available, addressing bias issues in policy evaluation.
Contribution
The paper introduces the LDE method for policy comparison using limited data and provides theoretical analysis and empirical evidence of its effectiveness.
Findings
LDE is statistically reliable for policy comparison under mild data distribution assumptions.
LDE outperforms other evaluation methods in ranking policies with limited data.
Numerical experiments demonstrate LDE's advantage in various settings.
Abstract
We address the challenge of policy evaluation in real-world applications of reinforcement learning systems where the available historical data is limited due to ethical, practical, or security considerations. This constrained distribution of data samples often leads to biased policy evaluation estimates. To remedy this, we propose that instead of policy evaluation, one should perform policy comparison, i.e. to rank the policies of interest in terms of their value based on available historical data. In addition we present the Limited Data Estimator (LDE) as a simple method for evaluating and comparing policies from a small number of interactions with the environment. According to our theoretical analysis, the LDE is shown to be statistically reliable on policy comparison tasks under mild assumptions on the distribution of the historical data. Additionally, our numerical experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
