Graph Policy Gradients for Large Scale Robot Control
Arbaaz Khan, Ekaterina Tolstaya, Alejandro Ribeiro, Vijay Kumar

TL;DR
This paper introduces Graph Policy Gradients, a scalable reinforcement learning method using graph convolutional networks to control large homogeneous robot swarms efficiently and transfer policies across different swarm sizes.
Contribution
The paper proposes a novel graph-based policy gradient algorithm that leverages graph symmetry and local filters for scalable, transferable control policies in large robot swarms.
Findings
Scales better than existing methods with fully connected networks.
Enables zero-shot transfer of policies from small to large robot groups.
Demonstrates effectiveness in formation flying tasks.
Abstract
In this paper, we consider the problem of learning policies to control a large number of homogeneous robots. To this end, we propose a new algorithm we call Graph Policy Gradients (GPG) that exploits the underlying graph symmetry among the robots. The curse of dimensionality one encounters when working with a large number of robots is mitigated by employing a graph convolutional neural (GCN) network to parametrize policies for the robots. The GCN reduces the dimensionality of the problem by learning filters that aggregate information among robots locally, similar to how a convolutional neural network is able to learn local features in an image. Through experiments on formation flying, we show that our proposed method is able to scale better than existing reinforcement methods that employ fully connected networks. More importantly, we show that by using our locally learned filters we are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Advanced Bandit Algorithms Research
