Policy Optimization in Multi-Agent Settings under Partially Observable Environments

Ainur Zhaikhan; Malek Khammassi; and Ali H. Sayed

arXiv:2508.06061·cs.MA·August 11, 2025

Policy Optimization in Multi-Agent Settings under Partially Observable Environments

Ainur Zhaikhan, Malek Khammassi, and Ali H. Sayed

PDF

Open Access

TL;DR

This paper introduces a novel approach combining social learning and reinforcement learning for multi-agent systems operating under partial observability, reducing computational complexity while maintaining high performance.

Contribution

It proposes a concurrent social and reinforcement learning framework that simplifies existing two-timescale methods with theoretical guarantees.

Findings

01

Performance approaches that of full state RL in simulations

02

Reduces computational complexity of multi-agent learning

03

Provides theoretical guarantees for the method

Abstract

This work leverages adaptive social learning to estimate partially observable global states in multi-agent reinforcement learning (MARL) problems. Unlike existing methods, the proposed approach enables the concurrent operation of social learning and reinforcement learning. Specifically, it alternates between a single step of social learning and a single step of MARL, eliminating the need for the time- and computation-intensive two-timescale learning frameworks. Theoretical guarantees are provided to support the effectiveness of the proposed method. Simulation results verify that the performance of the proposed methodology can approach that of reinforcement learning when the true state is known.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuction Theory and Applications