# Maximum Causal Entropy Specification Inference from Demonstrations

**Authors:** Marcell Vazquez-Chanlatte, Sanjit A. Seshia

arXiv: 1907.11792 · 2020-05-19

## TL;DR

This paper introduces a method to infer Boolean task specifications from demonstrations using maximum causal entropy inverse reinforcement learning, enabling efficient, composable, and history-aware task representations.

## Contribution

It adapts maximum causal entropy IRL to estimate Boolean specifications with efficient encoding via binary decision diagrams, improving computational tractability.

## Key findings

- Efficient polynomial-time algorithm for specification inference.
- Ability to handle non-Markovian, history-dependent tasks.
- Enhanced composability of learned task specifications.

## Abstract

In many settings (e.g., robotics) demonstrations provide a natural way to specify tasks; however, most methods for learning from demonstrations either do not provide guarantees that the artifacts learned for the tasks, such as rewards or policies, can be safely composed and/or do not explicitly capture history dependencies. Motivated by this deficit, recent works have proposed learning Boolean task specifications, a class of Boolean non-Markovian rewards which admit well-defined composition and explicitly handle historical dependencies. This work continues this line of research by adapting maximum causal entropy inverse reinforcement learning to estimate the posteriori probability of a specification given a multi-set of demonstrations. The key algorithmic insight is to leverage the extensive literature and tooling on reduced ordered binary decision diagrams to efficiently encode a time unrolled Markov Decision Process. This enables transforming a naive exponential time algorithm into a polynomial time algorithm.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.11792/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/1907.11792/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/1907.11792/full.md

---
Source: https://tomesphere.com/paper/1907.11792