Approximate Policy Iteration with Bisimulation Metrics
Mete Kemertas, Allan Jepson

TL;DR
This paper introduces a bisimulation-based approximate policy iteration method for MDPs, unifying state similarity metrics with Sinkhorn distances, providing theoretical guarantees, and demonstrating practical effectiveness through empirical analysis.
Contribution
It unifies bisimulation metrics with Sinkhorn distances, develops an API method with performance bounds, and connects theory with actor-critic practical algorithms.
Findings
Theoretical performance bounds for bisimulation-based API.
Bound on the difference between $$-bisimulation metrics and policy changes.
Empirical validation of bisimulation-based API on finite MDPs.
Abstract
Bisimulation metrics define a distance measure between states of a Markov decision process (MDP) based on a comparison of reward sequences. Due to this property they provide theoretical guarantees in value function approximation (VFA). In this work we first prove that bisimulation and -bisimulation metrics can be defined via a more general class of Sinkhorn distances, which unifies various state similarity metrics used in recent work. Then we describe an approximate policy iteration (API) procedure that uses a bisimulation-based discretization of the state space for VFA and prove asymptotic performance bounds. Next, we bound the difference between -bisimulation metrics in terms of the change in the policies themselves. Based on these results, we design an API() procedure that employs conservative policy updates and enjoys better performance bounds than the naive API…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAtrial Fibrillation Management and Outcomes · Fuel Cells and Related Materials · Adversarial Robustness in Machine Learning
