# Entropic Risk Measure in Policy Search

**Authors:** David Nass, Boris Belousov, and Jan Peters

arXiv: 1906.09090 · 2019-09-04

## TL;DR

This paper introduces an entropic risk measure into policy gradient methods to account for variability in performance, improving robustness in stochastic robotic environments, demonstrated through simulations and a real robot task.

## Contribution

It extends policy gradient algorithms by incorporating entropic risk, addressing the variability in policy performance often overlooked in traditional methods.

## Key findings

- Enhanced policy robustness to performance variability
- Successful application in robot badminton hitting task
- Improved stability in stochastic environments

## Abstract

With the increasing pace of automation, modern robotic systems need to act in stochastic, non-stationary, partially observable environments. A range of algorithms for finding parameterized policies that optimize for long-term average performance have been proposed in the past. However, the majority of the proposed approaches does not explicitly take into account the variability of the performance metric, which may lead to finding policies that although performing well on average, can perform spectacularly bad in a particular run or over a period of time. To address this shortcoming, we study an approach to policy optimization that explicitly takes into account higher order statistics of the reward function. In this paper, we extend policy gradient methods to include the entropic risk measure in the objective function and evaluate their performance in simulation experiments and on a real-robot task of learning a hitting motion in robot badminton.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.09090/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/1906.09090/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/1906.09090/full.md

---
Source: https://tomesphere.com/paper/1906.09090