TL;DR
This paper introduces scalable algorithms for approximating bisimulation metrics in large deterministic Markov Decision Processes, enabling behavioral state similarity analysis in complex environments.
Contribution
It presents a new behavior-policy tied metric and two algorithms—sampling-based and differentiable—for approximating bisimulation metrics in large or continuous state MDPs.
Findings
Sampling algorithm converges to true bisimulation metric.
Differentiable loss enables approximation in continuous state spaces.
The methods improve scalability for large MDPs.
Abstract
We present new algorithms for computing and approximating bisimulation metrics in Markov Decision Processes (MDPs). Bisimulation metrics are an elegant formalism that capture behavioral equivalence between states and provide strong theoretical guarantees on differences in optimal behaviour. Unfortunately, their computation is expensive and requires a tabular representation of the states, which has thus far rendered them impractical for large problems. In this paper we present a new version of the metric that is tied to a behavior policy in an MDP, along with an analysis of its theoretical properties. We then present two new algorithms for approximating bisimulation metrics in large, deterministic MDPs. The first does so via sampling and is guaranteed to converge to the true metric. The second is a differentiable loss which allows us to learn an approximation even for continuous state…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
