A Theoretical Analysis of State Similarity Between Markov Decision Processes
Zhenyu Tao, Wei Xu, Xiaohu You

TL;DR
This paper introduces a generalized bisimulation metric (GBSM) for quantifying state similarity between different MDPs, providing rigorous properties and tighter bounds for policy transfer and state aggregation.
Contribution
It formally establishes GBSM with key metric properties and analyzes its theoretical advantages over existing BSM in multi-MDP applications.
Findings
Proves GBSM satisfies symmetry, triangle inequality, and bounds.
Derives explicit, tighter bounds for policy transfer and state aggregation.
Provides a sample complexity bound for GBSM estimation.
Abstract
The bisimulation metric (BSM) is a powerful tool for analyzing state similarities within a Markov decision process (MDP), revealing that states closer in BSM have more similar optimal value functions. While BSM has been successfully utilized in reinforcement learning (RL) for tasks like state representation learning and policy exploration, its application to state similarity between multiple MDPs remains challenging. Prior work has attempted to extend BSM to pairs of MDPs, but a lack of well-established mathematical properties has limited further theoretical analysis between MDPs. In this work, we formally establish a generalized bisimulation metric (GBSM) for measuring state similarity between arbitrary pairs of MDPs, which is rigorously proven with three fundamental metric properties, i.e., GBSM symmetry, inter-MDP triangle inequality, and a distance bound on identical spaces.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Software Engineering Methodologies
