Simulation-Based Optimistic Policy Iteration For Multi-Agent MDPs with   Kullback-Leibler Control Cost

Khaled Nakhleh; Ceyhun Eksin; Sabit Ekin

arXiv:2410.15156·cs.AI·October 22, 2024

Simulation-Based Optimistic Policy Iteration For Multi-Agent MDPs with Kullback-Leibler Control Cost

Khaled Nakhleh, Ceyhun Eksin, Sabit Ekin

PDF

Open Access

TL;DR

This paper introduces a simulation-based optimistic policy iteration method for multi-agent MDPs with KL control costs, enabling agents to independently compute optimal policies with proven convergence.

Contribution

It presents a novel agent-based OPI scheme that handles KL control costs and demonstrates convergence for both synchronous and asynchronous evaluations.

Findings

01

Converges to the optimal value function and policy asymptotically.

02

Agents can compute policies independently using the Boltzmann distribution.

03

Validated on a multi-agent game with KL control costs.

Abstract

This paper proposes an agent-based optimistic policy iteration (OPI) scheme for learning stationary optimal stochastic policies in multi-agent Markov Decision Processes (MDPs), in which agents incur a Kullback-Leibler (KL) divergence cost for their control efforts and an additional cost for the joint state. The proposed scheme consists of a greedy policy improvement step followed by an m-step temporal difference (TD) policy evaluation step. We use the separable structure of the instantaneous cost to show that the policy improvement step follows a Boltzmann distribution that depends on the current value function estimate and the uncontrolled transition probabilities. This allows agents to compute the improved joint policy independently. We show that both the synchronous (entire state space evaluation) and asynchronous (a uniformly sampled set of substates) versions of the OPI scheme with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuction Theory and Applications

MethodsSparse Evolutionary Training