Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination
Dongge Han, Wendelin Boehmer, Michael Wooldridge, Alex Rogers

TL;DR
This paper introduces a dynamic termination approach in hierarchical multi-agent reinforcement learning, enabling agents to adapt their options flexibly while maintaining predictability, demonstrated through pursuit and taxi tasks.
Contribution
It proposes a novel dynamic termination Bellman equation for flexible option termination in multi-agent hierarchical reinforcement learning.
Findings
Agents learn to adapt termination behaviors across scenarios.
The method improves flexibility and predictability balance.
Empirical results show effective adaptation in pursuit and taxi tasks.
Abstract
In a multi-agent system, an agent's optimal policy will typically depend on the policies chosen by others. Therefore, a key issue in multi-agent systems research is that of predicting the behaviours of others, and responding promptly to changes in such behaviours. One obvious possibility is for each agent to broadcast their current intention, for example, the currently executed option in a hierarchical reinforcement learning framework. However, this approach results in inflexibility of agents if options have an extended duration and are dynamic. While adjusting the executed option at each step improves flexibility from a single-agent perspective, frequent changes in options can induce inconsistency between an agent's actual behaviour and its broadcast intention. In order to balance flexibility and predictability, we propose a dynamic termination Bellman equation that allows the agents…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
