L2C2: Locally Lipschitz Continuous Constraint towards Stable and Smooth Reinforcement Learning
Taisuke Kobayashi

TL;DR
This paper introduces L2C2, a local Lipschitz continuity regularization for reinforcement learning that balances smoothness and expressiveness, leading to more stable policies and improved task performance.
Contribution
It proposes a novel local Lipschitz constraint (L2C2) for RL that maintains expressiveness while enhancing stability and smoothness of learned policies.
Findings
L2C2 outperforms existing methods in task performance.
L2C2 effectively smooths robot actions from learned policies.
Numerical simulations validate the stability and effectiveness of L2C2.
Abstract
This paper proposes a new regularization technique for reinforcement learning (RL) towards making policy and value functions smooth and stable. RL is known for the instability of the learning process and the sensitivity of the acquired policy to noise. Several methods have been proposed to resolve these problems, and in summary, the smoothness of policy and value functions learned mainly in RL contributes to these problems. However, if these functions are extremely smooth, their expressiveness would be lost, resulting in not obtaining the global optimal solution. This paper therefore considers RL under local Lipschitz continuity constraint, so-called L2C2. By designing the spatio-temporal locally compact space for L2C2 from the state transition at each time step, the moderate smoothness can be achieved without loss of expressiveness. Numerical noisy simulations verified that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Smart Grid Energy Management
