Swarm Behavior Cloning
Jonas N\"u{\ss}lein, Maximilian Zorn, Philipp Altmann, Claudia, Linnhoff-Popien

TL;DR
This paper introduces a method to improve ensemble behavior cloning by reducing action differences among policies, leading to enhanced robustness and performance across various environments.
Contribution
It proposes a novel approach to align ensemble policies in behavior cloning, addressing action discrepancy issues and maintaining diversity for better decision-making.
Findings
Reduced action differences among ensemble policies.
Improved mean episode returns across eight environments.
Enhanced robustness and decision-making diversity.
Abstract
In sequential decision-making environments, the primary approaches for training agents are Reinforcement Learning (RL) and Imitation Learning (IL). Unlike RL, which relies on modeling a reward function, IL leverages expert demonstrations, where an expert policy (e.g., a human) provides the desired behavior. Formally, a dataset of state-action pairs is provided: . A common technique within IL is Behavior Cloning (BC), where a policy is learned through supervised learning on . Further improvements can be achieved by using an ensemble of individually trained BC policies, denoted as . The ensemble's action for a given state is the aggregated output of the actions: . This paper addresses the issue of increasing action differences -- the observation that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Chemical Sensor Technologies
