Beyond Expected Return: Accounting for Policy Reproducibility when Evaluating Reinforcement Learning Algorithms
Manon Flageat, Bryan Lim, Antoine Cully

TL;DR
This paper introduces a new evaluation metric for reinforcement learning policies that accounts for both expected return and reproducibility, addressing limitations of traditional metrics that ignore performance variability.
Contribution
It formalizes policy reproducibility, critiques existing evaluation methods, and proposes the Lower Confidence Bound metric to better compare policies considering performance stability.
Findings
Lower Confidence Bound effectively balances performance and reproducibility.
Traditional expected return metrics overlook variability, limiting policy comparison.
Experiments show improved policy evaluation with the proposed metric.
Abstract
Many applications in Reinforcement Learning (RL) usually have noise or stochasticity present in the environment. Beyond their impact on learning, these uncertainties lead the exact same policy to perform differently, i.e. yield different return, from one roll-out to another. Common evaluation procedures in RL summarise the consequent return distributions using solely the expected return, which does not account for the spread of the distribution. Our work defines this spread as the policy reproducibility: the ability of a policy to obtain similar performance when rolled out many times, a crucial property in some real-world applications. We highlight that existing procedures that only use the expected return are limited on two fronts: first an infinite number of return distributions with a wide range of performance-reproducibility trade-offs can have the same expected return, limiting its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Supply Chain and Inventory Management
