# Chance-Constrained Trajectory Optimization for Non-linear Systems with   Unknown Stochastic Dynamics

**Authors:** Onur Celik, Hany Abdulsamad, Jan Peters

arXiv: 1906.11003 · 2019-08-01

## TL;DR

This paper introduces a chance-constrained trajectory optimization method for non-linear systems with unknown stochastic dynamics, improving robustness and avoiding premature convergence in model-based reinforcement learning.

## Contribution

It proposes a novel approach that incorporates probabilistic chance constraints into trajectory optimization, addressing physical limits and enhancing learning stability.

## Key findings

- Significant improvement in learning robustness.
- Better avoidance of unreachable state-action areas.
- Enhanced performance over state-of-the-art algorithms.

## Abstract

Iterative trajectory optimization techniques for non-linear dynamical systems are among the most powerful and sample-efficient methods of model-based reinforcement learning and approximate optimal control. By leveraging time-variant local linear-quadratic approximations of system dynamics and reward, such methods can find both a target-optimal trajectory and time-variant optimal feedback controllers. However, the local linear-quadratic assumptions are a major source of optimization bias that leads to catastrophic greedy updates, raising the issue of proper regularization. Moreover, the approximate models' disregard for any physical state-action limits of the system causes further aggravation of the problem, as the optimization moves towards unreachable areas of the state-action space. In this paper, we address the issue of constrained systems in the scenario of online-fitted stochastic linear dynamics. We propose modeling state and action physical limits as probabilistic chance constraints linear in both state and action and introduce a new trajectory optimization technique that integrates these probabilistic constraints by optimizing a relaxed quadratic program. Our empirical evaluations show a significant improvement in learning robustness, which enables our approach to perform more effective updates and avoid premature convergence observed in state-of-the-art algorithms.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.11003/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1906.11003/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/1906.11003/full.md

---
Source: https://tomesphere.com/paper/1906.11003