Approximate Policy Iteration with Bisimulation Metrics

Mete Kemertas; Allan Jepson

arXiv:2202.02881·cs.LG·November 15, 2022·1 cites

Approximate Policy Iteration with Bisimulation Metrics

Mete Kemertas, Allan Jepson

PDF

Open Access 1 Repo

TL;DR

This paper introduces a bisimulation-based approximate policy iteration method for MDPs, unifying state similarity metrics with Sinkhorn distances, providing theoretical guarantees, and demonstrating practical effectiveness through empirical analysis.

Contribution

It unifies bisimulation metrics with Sinkhorn distances, develops an API method with performance bounds, and connects theory with actor-critic practical algorithms.

Findings

01

Theoretical performance bounds for bisimulation-based API.

02

Bound on the difference between $$-bisimulation metrics and policy changes.

03

Empirical validation of bisimulation-based API on finite MDPs.

Abstract

Bisimulation metrics define a distance measure between states of a Markov decision process (MDP) based on a comparison of reward sequences. Due to this property they provide theoretical guarantees in value function approximation (VFA). In this work we first prove that bisimulation and $π$ -bisimulation metrics can be defined via a more general class of Sinkhorn distances, which unifies various state similarity metrics used in recent work. Then we describe an approximate policy iteration (API) procedure that uses a bisimulation-based discretization of the state space for VFA and prove asymptotic performance bounds. Next, we bound the difference between $π$ -bisimulation metrics in terms of the change in the policies themselves. Based on these results, we design an API( $α$ ) procedure that employs conservative policy updates and enjoys better performance bounds than the naive API…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

metekemertas/api-bisim
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAtrial Fibrillation Management and Outcomes · Fuel Cells and Related Materials · Adversarial Robustness in Machine Learning