Multi-Agent Learning in Contextual Games under Unknown Constraints
Anna M. Maddux, Maryam Kamgarpour

TL;DR
This paper introduces a novel learning algorithm for multi-agent contextual games with unknown constraints, ensuring no constraint violations over time and achieving low regret, with theoretical guarantees and practical validation.
Contribution
It develops c.z.AdaNormalGP, a no-regret, no-violation algorithm for unknown constrained contextual games, with kernel-based regret bounds and equilibrium concepts.
Findings
Algorithm achieves sublinear regret bounds.
Constraint violations diminish to zero over time.
Effective in multi-agent reinforcement learning experiments.
Abstract
We consider the problem of learning to play a repeated contextual game with unknown reward and unknown constraints functions. Such games arise in applications where each agent's action needs to belong to a feasible set, but the feasible set is a priori unknown. For example, in constrained multi-agent reinforcement learning, the constraints on the agents' policies are a function of the unknown dynamics and hence, are themselves unknown. Under kernel-based regularity assumptions on the unknown functions, we develop a no-regret, no-violation approach which exploits similarities among different reward and constraint outcomes. The no-violation property ensures that the time-averaged sum of constraint violations converges to zero as the game is repeated. We show that our algorithm, referred to as c.z.AdaNormalGP, obtains kernel-dependent regret bounds and that the cumulative constraint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Decision-Making and Behavioral Economics
