Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation
Mohidul Haque Mridul, Mohammad Foysal Khan, Redwan Ahmed Rizvee, Md, Mosaddek Khan

TL;DR
This paper introduces OPS-DeMo, an online method for real-time detection of opponent policy changes in multi-agent environments, enhancing robustness and decision-making accuracy in dynamic scenarios.
Contribution
We propose OPS-DeMo, a novel online algorithm that detects opponent policy switches using dynamic error decay and maintains a bank of response policies for improved adaptability.
Findings
Outperforms PPO in dynamic Predator-Prey scenarios.
Provides accurate real-time opponent policy change detection.
Enhances multi-agent decision robustness in non-stationary environments.
Abstract
In Multi-agent Reinforcement Learning (MARL), accurately perceiving opponents' strategies is essential for both cooperative and adversarial contexts, particularly within dynamic environments. While Proximal Policy Optimization (PPO) and related algorithms such as Actor-Critic with Experience Replay (ACER), Trust Region Policy Optimization (TRPO), and Deep Deterministic Policy Gradient (DDPG) perform well in single-agent, stationary environments, they suffer from high variance in MARL due to non-stationary and hidden policies of opponents, leading to diminished reward performance. Additionally, existing methods in MARL face significant challenges, including the need for inter-agent communication, reliance on explicit reward information, high computational demands, and sampling inefficiencies. These issues render them less effective in continuous environments where opponents may abruptly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Grid Security and Resilience · Power System Optimization and Stability · Smart Grid Energy Management
MethodsExperience Replay · Entropy Regularization · Proximal Policy Optimization
