Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time   Strategy Switch Identification Using Running Error Estimation

Mohidul Haque Mridul; Mohammad Foysal Khan; Redwan Ahmed Rizvee; Md; Mosaddek Khan

arXiv:2406.06500·cs.AI·June 11, 2024

Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation

Mohidul Haque Mridul, Mohammad Foysal Khan, Redwan Ahmed Rizvee, Md, Mosaddek Khan

PDF

Open Access

TL;DR

This paper introduces OPS-DeMo, an online method for real-time detection of opponent policy changes in multi-agent environments, enhancing robustness and decision-making accuracy in dynamic scenarios.

Contribution

We propose OPS-DeMo, a novel online algorithm that detects opponent policy switches using dynamic error decay and maintains a bank of response policies for improved adaptability.

Findings

01

Outperforms PPO in dynamic Predator-Prey scenarios.

02

Provides accurate real-time opponent policy change detection.

03

Enhances multi-agent decision robustness in non-stationary environments.

Abstract

In Multi-agent Reinforcement Learning (MARL), accurately perceiving opponents' strategies is essential for both cooperative and adversarial contexts, particularly within dynamic environments. While Proximal Policy Optimization (PPO) and related algorithms such as Actor-Critic with Experience Replay (ACER), Trust Region Policy Optimization (TRPO), and Deep Deterministic Policy Gradient (DDPG) perform well in single-agent, stationary environments, they suffer from high variance in MARL due to non-stationary and hidden policies of opponents, leading to diminished reward performance. Additionally, existing methods in MARL face significant challenges, including the need for inter-agent communication, reliance on explicit reward information, high computational demands, and sampling inefficiencies. These issues render them less effective in continuous environments where opponents may abruptly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Grid Security and Resilience · Power System Optimization and Stability · Smart Grid Energy Management

MethodsExperience Replay · Entropy Regularization · Proximal Policy Optimization