Joint MDPs and Reinforcement Learning in Coupled-Dynamics Environments

Ege C. Kaya; Mahsa Ghasemi; Abolfazl Hashemi

arXiv:2603.06946·cs.LG·March 10, 2026

Joint MDPs and Reinforcement Learning in Coupled-Dynamics Environments

Ege C. Kaya, Mahsa Ghasemi, Abolfazl Hashemi

PDF

Open Access

TL;DR

This paper introduces joint MDPs (JMDPs), a new formalism for modeling environments where the joint distribution of outcomes across multiple actions is crucial, extending classical MDPs to capture coupled dynamics.

Contribution

The paper proposes JMDPs that incorporate joint distributions of multi-action outcomes, along with Bellman operators for return moments, enabling new algorithms with convergence guarantees.

Findings

01

Formalization of joint MDPs for coupled-dynamics environments

02

Development of Bellman operators for return moments

03

Algorithms with proven convergence guarantees

Abstract

Many distributional quantities in reinforcement learning are intrinsically joint across actions, including distributions of gaps and probabilities of superiority. However, the classical Markov decision process (MDP) formalism specifies only marginal laws and leaves the joint law of counterfactual one-step outcomes across multiple possible actions at a state unspecified. We study coupled-dynamics environments with a multi-action generative interface which can sample counterfactual one-step outcomes for multiple actions under shared exogenous randomness. We propose joint MDPs (JMDPs) as a formalism for such environments by augmenting an MDP with a multi-action sample transition model which specifies a coupling of one-step counterfactual outcomes, while preserving standard MDP interaction as marginal observations. We adopt and formalize a one-step coupling regime where dependence across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Embodied and Extended Cognition