Behavioral Cloning via Search in Video PreTraining Latent Space
Federico Malato, Florian Leopold, Amogh Raut, Ville Hautam\"aki,, Andrew Melnik

TL;DR
This paper introduces a method for autonomous agents to imitate expert behavior in Minecraft by searching over a dataset of demonstrations in a latent space, enabling human-like actions through proximity-based trajectory matching.
Contribution
It presents a novel imitation learning approach that uses latent space search over a large dataset to replicate expert trajectories in complex environments.
Findings
Effective recovery of meaningful demonstration trajectories
Agents exhibit human-like behavior in Minecraft
Proximity search in latent space enables imitation
Abstract
Our aim is to build autonomous agents that can solve tasks in environments like Minecraft. To do so, we used an imitation learning-based approach. We formulate our control problem as a search problem over a dataset of experts' demonstrations, where the agent copies actions from a similar demonstration trajectory of image-action pairs. We perform a proximity search over the BASALT MineRL-dataset in the latent representation of a Video PreTraining model. The agent copies the actions from the expert trajectory as long as the distance between the state representations of the agent and the selected expert trajectory from the dataset do not diverge. Then the proximity search is repeated. Our approach can effectively recover meaningful demonstration trajectories and show human-like behavior of an agent in the Minecraft environment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis
