Robot Fleet Learning via Policy Merging
Lirui Wang, Kaiqing Zhang, Allan Zhou, Max Simchowitz, Russ Tedrake

TL;DR
This paper introduces FLEET-MERGE, a distributed policy merging method for robot fleets that efficiently consolidates diverse policies without centralizing data, demonstrated on new benchmarks and multiple tasks.
Contribution
We propose FLEET-MERGE, a novel policy merging algorithm that handles permutation invariance in recurrent neural network policies for fleet learning.
Findings
FLEET-MERGE effectively consolidates policies trained on 50 tasks in Meta-World.
The method performs well on nearly all training tasks at test time.
FLEET-MERGE demonstrates efficacy on the new FLEET-TOOLS benchmark.
Abstract
Fleets of robots ingest massive amounts of heterogeneous streaming data silos generated by interacting with their environments, far more than what can be stored or transmitted with ease. At the same time, teams of robots should co-acquire diverse skills through their heterogeneous experiences in varied settings. How can we enable such fleet-level learning without having to transmit or centralize fleet-scale data? In this paper, we investigate policy merging (PoMe) from such distributed heterogeneous datasets as a potential solution. To efficiently merge policies in the fleet setting, we propose FLEET-MERGE, an instantiation of distributed learning that accounts for the permutation invariance that arises when parameterizing the control policies with recurrent neural networks. We show that FLEET-MERGE consolidates the behavior of policies trained on 50 tasks in the Meta-World environment,…
Peer Reviews
Decision·ICLR 2024 poster
This is a very interesting paper and is well written. The problem that they investigate is well supported and they propose (to my knowledge) a novel algorithm that allows us to merge different neural networks by aligning their internal representation spaces. They do extensive experiments in simulation and the real-world and show that policies from diverse tasks can be merged together in a smart way. They leverage known properties of RNNs (i.e. internal permutation invariance and time-invarianc
While the method proposed is clear to me, I do not understand how the merged policy can determine what task to solve during test time. Further, in the results section I did not see how well the policies perform on a single task. In figure 4, it is clear that FleetMerge performs better than Federated Averaging, however I am curious to see how much of a decrease in performance is incurred by merging the policies at all.
This paper is very novel in both its problem setting. Policy merging, to the best of my knowledge, has not been explicitly studied before. This paper will have impact in the robot learning community, where large-scale datasets as well as data/model sharing across institutions are becoming increasingly prevalent. However, most such effort has primarily focused on consolidating all data to train a single large model (e.g., the recent RT-X effort), but this paper offers a fresh perspective that ope
The main weaknesses of the paper is in its presentation and the strength of the experimental results. First, the presentation of the paper can be improved. In Figure 3 and 4, several charts are hard to see because the legends block them. The experiment section can be better structured by having subsections for each of the evaluation setting instead of each benchmark; right now, all evaluation settings are presented upfront without concrete contextualization and results that interleave between
1) The experimental results for policy merging are strong for merging RNN policies trained for different tasks. The proposed method outperforms naive averaging and other similar baselines. Distributed learning is becoming increasingly important for robotics as deployment of fleets of robots for data collection is more feasible. 2) The introduced benchmark FLEET-TOOLS is a useful contribution to the robot manipulation community, for easy collection of expert trajectories useful for training poli
1) The proposed algorithm was only demonstrated for behavior cloning, which is not what we generally consider as the setup where we benefit from distributed learning. If we have a large amount of static expert data, it is not too difficult to just merge the datasets and train a policy on the joint dataset. When we consider fleet policy learning, it is much more useful to consider a reinforcement learning setup where a collection of robots are collecting data for various tasks, and we wish to col
Code & Models
Videos
Taxonomy
TopicsElectric Vehicles and Infrastructure · Safety Systems Engineering in Autonomy · Energy, Environment, and Transportation Policies
