TL;DR
This paper investigates how opponent learning awareness and modelling influence multi-objective multi-agent interactions with non-linear utilities, demonstrating significant impacts on learning dynamics and solution convergence.
Contribution
It introduces novel actor-critic and policy gradient methods for reinforcement learning in multi-objective games with opponent modelling and learning awareness.
Findings
Opponent modelling can significantly change learning dynamics.
Learning with opponent awareness benefits agents in equilibrium and non-equilibrium scenarios.
Agents can approximate equilibria even without Nash equilibria present.
Abstract
Many real-world multi-agent interactions consider multiple distinct criteria, i.e. the payoffs are multi-objective in nature. However, the same multi-objective payoff vector may lead to different utilities for each participant. Therefore, it is essential for an agent to learn about the behaviour of other agents in the system. In this work, we present the first study of the effects of such opponent modelling on multi-objective multi-agent interactions with non-linear utilities. Specifically, we consider two-player multi-objective normal form games with non-linear utility functions under the scalarised expected returns optimisation criterion. We contribute novel actor-critic and policy gradient formulations to allow reinforcement learning of mixed strategies in this setting, along with extensions that incorporate opponent policy reconstruction and learning with opponent learning awareness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
