Using Reinforcement Learning to Herd a Robotic Swarm to a Target Distribution
Zahi M. Kakish, Karthik Elamvazhuthi, Spring Berman

TL;DR
This paper introduces a reinforcement learning-based control strategy for a leader agent to herd a swarm of followers to a target distribution on a graph, scalable with the number of agents, validated through simulations and physical robot experiments.
Contribution
The paper develops a mean-field model-based reinforcement learning approach for scalable swarm herding control using only population-level observations.
Findings
Successfully trained policies on simulated agents ranging from 10 to 100.
Transferred policies from simulation to physical robots with successful distribution control.
Demonstrated scalability and effectiveness of the approach in both simulation and real-world experiments.
Abstract
In this paper, we present a reinforcement learning approach to designing a control policy for a "leader" agent that herds a swarm of "follower" agents, via repulsive interactions, as quickly as possible to a target probability distribution over a strongly connected graph. The leader control policy is a function of the swarm distribution, which evolves over time according to a mean-field model in the form of an ordinary difference equation. The dependence of the policy on agent populations at each graph vertex, rather than on individual agent activity, simplifies the observations required by the leader and enables the control strategy to scale with the number of agents. Two Temporal-Difference learning algorithms, SARSA and Q-Learning, are used to generate the leader control policy based on the follower agent distribution and the leader's location on the graph. A simulation environment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSarsa · Q-Learning
