Reinforcement Learning for Heterogeneous Teams with PALO Bounds
Roi Ceren, Prashant Doshi, Keyang He

TL;DR
This paper develops reinforcement learning methods for heterogeneous multi-agent systems with factored rewards, introducing PALO bounds for sample complexity analysis and demonstrating improved efficiency in diverse robotic coordination tasks.
Contribution
It presents two novel RL templates for heterogeneous teams with factored rewards and applies PALO bounds for sample complexity analysis, enhancing learning efficiency.
Findings
MCES-FMP outperforms MCES-MP in sample efficiency
Inclusion of policy space pruning improves learning speed
Approaches successfully applied to robotic coordination domains
Abstract
We introduce reinforcement learning for heterogeneous teams in which rewards for an agent are additively factored into local costs, stimuli unique to each agent, and global rewards, those shared by all agents in the domain. Motivating domains include coordination of varied robotic platforms, which incur different costs for the same action, but share an overall goal. We present two templates for learning in this setting with factored rewards: a generalization of Perkins' Monte Carlo exploring starts for POMDPs to canonical MPOMDPs, with a single policy mapping joint observations of all agents to joint actions (MCES-MP); and another with each agent individually mapping joint observations to their own action (MCES-FMP). We use probably approximately local optimal (PALO) bounds to analyze sample complexity, instantiating these templates to PALO learning. We promote sample efficiency by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
