Scale-free memory model for multiagent reinforcement learning. Mean field approximation and rock-paper-scissors dynamics
Ihor Lubashevsky, Shigeru Kanemoto

TL;DR
This paper introduces a continuous-time multiagent reinforcement learning model with scale-free memory, revealing complex dynamics including bifurcations, oscillations, and self-organization in rock-paper-scissors interactions.
Contribution
It develops a fractional differential equation framework for multiagent systems with power-law memory, analyzing stability and dynamics in non-transitive interactions.
Findings
Scale-free memory induces complex, irregular dynamics.
Multiple instability modes with bifurcations are identified.
Oscillations grow in amplitude and period over time.
Abstract
A continuous time model for multiagent systems governed by reinforcement learning with scale-free memory is developed. The agents are assumed to act independently of one another in optimizing their choice of possible actions via trial-and-error search. To gain awareness about the action value the agents accumulate in their memory the rewards obtained from taking a specific action at each moment of time. The contribution of the rewards in the past to the agent current perception of action value is described by an integral operator with a power-law kernel. Finally a fractional differential equation governing the system dynamics is obtained. The agents are considered to interact with one another implicitly via the reward of one agent depending on the choice of the other agents. The pairwise interaction model is adopted to describe this effect. As a specific example of systems with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
