IV-Posterior: Inverse Value Estimation for Interpretable Policy Certificates
Tatiana Lopez-Guevara, Michael Burke, Nicholas K. Taylor, Kartic Subr

TL;DR
IV-Posterior introduces a method to create interpretable policy certificates in reinforcement learning by estimating the conditions under which policies are effective, enhancing deployment safety and performance.
Contribution
It proposes inverse value estimation using MaskedAutoregressive Flows to identify and utilize the operational conditions of pre-trained policies for better interpretability and deployment.
Findings
Performance improves with policy selection based on inductive biases.
Method provides interpretable certificates for policy effectiveness.
Applicable across multiple environments.
Abstract
Model-free reinforcement learning (RL) is a powerful tool to learn a broad range of robot skills and policies. However, a lack of policy interpretability can inhibit their successful deployment in downstream applications, particularly when differences in environmental conditions may result in unpredictable behaviour or generalisation failures. As a result, there has been a growing emphasis in machine learning around the inclusion of stronger inductive biases in models to improve generalisation. This paper proposes an alternative strategy, inverse value estimation for interpretable policy certificates (IV-Posterior), which seeks to identify the inductive biases or idealised conditions of operation already held by pre-trained policies, and then use this information to guide their deployment. IV-Posterior uses MaskedAutoregressive Flows to fit distributions over the set of conditions or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
MethodsInterpretability
