ColorGrid: A Multi-Agent Non-Stationary Environment for Goal Inference and Assistance
Andrey Risukhin, Kavel Rao, Ben Caffee, Alan Fan

TL;DR
ColorGrid is a new multi-agent environment designed to evaluate goal inference and assistance in non-stationary, asymmetric settings, revealing limitations of current algorithms like IPPO and aiding future research.
Contribution
We introduce ColorGrid, a customizable MARL environment with non-stationarity and asymmetry, and demonstrate its challenge to state-of-the-art algorithms like IPPO.
Findings
IPPO struggles with simultaneous non-stationary and asymmetric goals.
ColorGrid reveals limitations of current MARL algorithms.
Benchmarking resources are provided for future research.
Abstract
Autonomous agents' interactions with humans are increasingly focused on adapting to their changing preferences in order to improve assistance in real-world tasks. Effective agents must learn to accurately infer human goals, which are often hidden, to collaborate well. However, existing Multi-Agent Reinforcement Learning (MARL) environments lack the necessary attributes required to rigorously evaluate these agents' learning capabilities. To this end, we introduce ColorGrid, a novel MARL environment with customizable non-stationarity, asymmetry, and reward structure. We investigate the performance of Independent Proximal Policy Optimization (IPPO), a state-of-the-art (SOTA) MARL algorithm, in ColorGrid and find through extensive ablations that, particularly with simultaneous non-stationary and asymmetric goals between a ``leader'' agent representing a human and a ``follower'' assistant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Color perception and design · Data Management and Algorithms
