On Robust Reinforcement Learning with Lipschitz-Bounded Policy Networks
Nicholas H. Barbara, Ruigang Wang, Ian R. Manchester

TL;DR
This paper explores how Lipschitz-bounded policy networks in deep reinforcement learning enhance robustness against disturbances and adversarial attacks, showing that certain Lipschitz layer structures improve performance without degrading accuracy.
Contribution
It demonstrates that Lipschitz-bounded policy parameterizations improve robustness in reinforcement learning and compares different Lipschitz layer structures for effectiveness.
Findings
Lipschitz-bounded policies are more robust to noise and attacks.
Sandwich layers outperform spectral normalization in robustness and performance.
Smaller Lipschitz bounds lead to increased robustness.
Abstract
This paper presents a study of robust policy networks in deep reinforcement learning. We investigate the benefits of policy parameterizations that naturally satisfy constraints on their Lipschitz bound, analyzing their empirical performance and robustness on two representative problems: pendulum swing-up and Atari Pong. We illustrate that policy networks with smaller Lipschitz bounds are more robust to disturbances, random noise, and targeted adversarial attacks than unconstrained policies composed of vanilla multi-layer perceptrons or convolutional neural networks. However, the structure of the Lipschitz layer is important. We find that the widely-used method of spectral normalization is too conservative and severely impacts clean performance, whereas more expressive Lipschitz layers such as the recently-proposed Sandwich layer can achieve improved robustness without sacrificing clean…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control
MethodsSpectral Normalization
