Reinforcement Learning for Heterogeneous Teams with PALO Bounds

Roi Ceren; Prashant Doshi; Keyang He

arXiv:1805.09267·cs.LG·November 19, 2020

Reinforcement Learning for Heterogeneous Teams with PALO Bounds

Roi Ceren, Prashant Doshi, Keyang He

PDF

TL;DR

This paper develops reinforcement learning methods for heterogeneous multi-agent systems with factored rewards, introducing PALO bounds for sample complexity analysis and demonstrating improved efficiency in diverse robotic coordination tasks.

Contribution

It presents two novel RL templates for heterogeneous teams with factored rewards and applies PALO bounds for sample complexity analysis, enhancing learning efficiency.

Findings

01

MCES-FMP outperforms MCES-MP in sample efficiency

02

Inclusion of policy space pruning improves learning speed

03

Approaches successfully applied to robotic coordination domains

Abstract

We introduce reinforcement learning for heterogeneous teams in which rewards for an agent are additively factored into local costs, stimuli unique to each agent, and global rewards, those shared by all agents in the domain. Motivating domains include coordination of varied robotic platforms, which incur different costs for the same action, but share an overall goal. We present two templates for learning in this setting with factored rewards: a generalization of Perkins' Monte Carlo exploring starts for POMDPs to canonical MPOMDPs, with a single policy mapping joint observations of all agents to joint actions (MCES-MP); and another with each agent individually mapping joint observations to their own action (MCES-FMP). We use probably approximately local optimal (PALO) bounds to analyze sample complexity, instantiating these templates to PALO learning. We promote sample efficiency by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.