Joint MDPs and Reinforcement Learning in Coupled-Dynamics Environments
Ege C. Kaya, Mahsa Ghasemi, Abolfazl Hashemi

TL;DR
This paper introduces joint MDPs (JMDPs), a new formalism for modeling environments where the joint distribution of outcomes across multiple actions is crucial, extending classical MDPs to capture coupled dynamics.
Contribution
The paper proposes JMDPs that incorporate joint distributions of multi-action outcomes, along with Bellman operators for return moments, enabling new algorithms with convergence guarantees.
Findings
Formalization of joint MDPs for coupled-dynamics environments
Development of Bellman operators for return moments
Algorithms with proven convergence guarantees
Abstract
Many distributional quantities in reinforcement learning are intrinsically joint across actions, including distributions of gaps and probabilities of superiority. However, the classical Markov decision process (MDP) formalism specifies only marginal laws and leaves the joint law of counterfactual one-step outcomes across multiple possible actions at a state unspecified. We study coupled-dynamics environments with a multi-action generative interface which can sample counterfactual one-step outcomes for multiple actions under shared exogenous randomness. We propose joint MDPs (JMDPs) as a formalism for such environments by augmenting an MDP with a multi-action sample transition model which specifies a coupling of one-step counterfactual outcomes, while preserving standard MDP interaction as marginal observations. We adopt and formalize a one-step coupling regime where dependence across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Embodied and Extended Cognition
