LOQA: Learning with Opponent Q-Learning Awareness
Milad Aghajohari, Juan Agustin Duque, Tim Cooijmans, Aaron Courville

TL;DR
LOQA is a decentralized reinforcement learning algorithm designed to improve individual utility and cooperation among agents in general-sum games, demonstrating state-of-the-art results with low computational costs.
Contribution
LOQA introduces a novel opponent-aware Q-learning method that enhances multi-agent cooperation and efficiency in partially competitive environments.
Findings
Achieves state-of-the-art performance in benchmark games
Reduces computational complexity compared to existing methods
Effectively fosters cooperation among adversarial agents
Abstract
In various real-world scenarios, interactions among agents often resemble the dynamics of general-sum games, where each agent strives to optimize its own utility. Despite the ubiquitous relevance of such settings, decentralized machine learning algorithms have struggled to find equilibria that maximize individual utility while preserving social welfare. In this paper we introduce Learning with Opponent Q-Learning Awareness (LOQA), a novel, decentralized reinforcement learning algorithm tailored to optimizing an agent's individual utility while fostering cooperation among adversaries in partially competitive environments. LOQA assumes the opponent samples actions proportionally to their action-value function Q. Experimental results demonstrate the effectiveness of LOQA at achieving state-of-the-art performance in benchmark scenarios such as the Iterated Prisoner's Dilemma and the Coin…
Peer Reviews
Decision·ICLR 2024 poster
The authors try to address the computational challenges faced by other MARL algorithms for sequential social dilemmas, by proposing an algorithm where each agent maintains an estimate of the Q values of all its opponents in order to determine its own policy improvement.
1. There are several papers in literature that provide decentralized algorithms to achieve individually and socially optimal solution in sequential social dilemmas. One of the criticism of these papers is the additional information needed by these algorithms, which is often not available in the real-world. This paper also has the same limitations. 2. I think the novelty in the proposed method is limited based on the papers cited by it. The key idea is that each agent model the opponents policy
1. I like the idea of deriving some cooperative solutions in mixed-motive games without computing the meta-game solutions. There were many efforts in this direction but only a few paid off, the main limitation lies in the scalability of the multi-agent problems. 2. The paper is clear and easy to follow.
In general, the experimental part has room for improvement 1. When this line of research on LOLA has a few prior works, a comparison with a decent amount of previous works is necessary so that we know the proposed method is better. The good performance of a particular method under a particular environment is not the reason to abandon other methods, especially when POLA [1] did not compare with M-FOS [2] 2. Results on the IPD and coin environment may be a bit preliminary when we jointly consider
- Solutions to conditional (equilibrium-based) cooperation in general and improvements of LOLA in particular are relevant to MARL. - The paper is straightforward and well-written. - The experiments are sound, I especially like fig. 2.
- Related work lacks discussion of MARL approaches to learning prosocial equilibria other than reciprocity-based or opponent-shaping-based, such as reward redistribution https://ala2020.vub.ac.be/papers/ALA2020_paper_45.pdf https://arxiv.org/abs/2004.13332, mediation https://arxiv.org/pdf/2306.08419.pdf, contracts https://arxiv.org/pdf/2208.10469.pdf, and similarity-based equilibria https://arxiv.org/pdf/2211.14468.pdf. - Some limitations of previous LOLA-based approaches that LOQA does not fix
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Machine Learning and Data Classification · Neural Networks and Applications
MethodsQ-Learning
