# Conditioning of Reinforcement Learning Agents and its Policy   Regularization Application

**Authors:** Arip Asadulaev, Igor Kuznetsov, Gideon Stein, Andrey Filchenkov

arXiv: 1906.05437 · 2020-07-15

## TL;DR

This paper investigates how Jacobian conditioning regularization influences the stability and generalization of reinforcement learning policies, introducing a new regularization method and evaluating its effectiveness on continuous control and generalization tasks.

## Contribution

It is the first study to explore Jacobian conditioning in reinforcement learning, proposing a regularization algorithm to improve policy stability and generalization.

## Key findings

- Conditioning regularization improves policy stability.
- Regularization enhances generalization to unseen levels.
- Proposed method outperforms baselines on control tasks.

## Abstract

The outcome of Jacobian singular values regularization was studied for supervised learning problems. It also was shown that Jacobian conditioning regularization can help to avoid the ``mode-collapse'' problem in Generative Adversarial Networks. In this paper, we try to answer the following question: Can information about policy conditioning help to shape a more stable and general policy of reinforcement learning agents? To answer this question, we conduct a study of Jacobian conditioning behavior during policy optimization. To the best of our knowledge, this is the first work that research condition number in reinforcement learning agents. We propose a conditioning regularization algorithm and test its performance on the range of continuous control tasks. Finally, we compare algorithms on the CoinRun environment with separated train end test levels to analyze how conditioning regularization contributes to agents' generalization.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.05437/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1906.05437/full.md

## References

15 references — full list in the complete paper: https://tomesphere.com/paper/1906.05437/full.md

---
Source: https://tomesphere.com/paper/1906.05437