# Learning a Behavioral Repertoire from Demonstrations

**Authors:** Niels Justesen, Miguel Gonzalez Duque, Daniel Cabarcas Jaramillo,, Jean-Baptiste Mouret, Sebastian Risi

arXiv: 1907.03046 · 2019-07-09

## TL;DR

This paper introduces Behavioral Repertoire Imitation Learning (BRIL), a method that learns a set of diverse behaviors from demonstrations, enabling precise control and adaptation of policies in complex tasks like StarCraft II.

## Contribution

BRIL extends imitation learning by learning a conditioned policy that captures multiple behaviors, allowing for behavior modulation and improved performance over traditional IL.

## Key findings

- Learned policy can be manipulated to express distinct behaviors.
- Behavioral space constructed with PCA captures meaningful variations.
- Adaptive behavior tuning surpasses baseline IL performance.

## Abstract

Imitation Learning (IL) is a machine learning approach to learn a policy from a dataset of demonstrations. IL can be useful to kick-start learning before applying reinforcement learning (RL) but it can also be useful on its own, e.g. to learn to imitate human players in video games. However, a major limitation of current IL approaches is that they learn only a single "average" policy based on a dataset that possibly contains demonstrations of numerous different types of behaviors. In this paper, we propose a new approach called Behavioral Repertoire Imitation Learning (BRIL) that instead learns a repertoire of behaviors from a set of demonstrations by augmenting the state-action pairs with behavioral descriptions. The outcome of this approach is a single neural network policy conditioned on a behavior description that can be precisely modulated. We apply this approach to train a policy on 7,777 human replays to perform build-order planning in StarCraft II. Principal Component Analysis (PCA) is applied to construct a low-dimensional behavioral space from the high-dimensional army unit composition of each demonstration. The results demonstrate that the learned policy can be effectively manipulated to express distinct behaviors. Additionally, by applying the UCB1 algorithm, we are able to adapt the behavior of the policy - in-between games - to reach a performance beyond that of the traditional IL baseline approach.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.03046/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1907.03046/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/1907.03046/full.md

---
Source: https://tomesphere.com/paper/1907.03046