Scalable methods for computing state similarity in deterministic Markov   Decision Processes

Pablo Samuel Castro

arXiv:1911.09291·cs.LG·November 22, 2019

Scalable methods for computing state similarity in deterministic Markov Decision Processes

Pablo Samuel Castro

PDF

1 Repo

TL;DR

This paper introduces scalable algorithms for approximating bisimulation metrics in large deterministic Markov Decision Processes, enabling behavioral state similarity analysis in complex environments.

Contribution

It presents a new behavior-policy tied metric and two algorithms—sampling-based and differentiable—for approximating bisimulation metrics in large or continuous state MDPs.

Findings

01

Sampling algorithm converges to true bisimulation metric.

02

Differentiable loss enables approximation in continuous state spaces.

03

The methods improve scalability for large MDPs.

Abstract

We present new algorithms for computing and approximating bisimulation metrics in Markov Decision Processes (MDPs). Bisimulation metrics are an elegant formalism that capture behavioral equivalence between states and provide strong theoretical guarantees on differences in optimal behaviour. Unfortunately, their computation is expensive and requires a tabular representation of the states, which has thus far rendered them impractical for large problems. In this paper we present a new version of the metric that is tied to a behavior policy in an MDP, along with an analysis of its theoretical properties. We then present two new algorithms for approximating bisimulation metrics in large, deterministic MDPs. The first does so via sampling and is guaranteed to converge to the true metric. The second is a differentiable loss which allows us to learn an approximation even for continuous state…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/google-research/tree/master/bisimulation_aaai2020
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.