Direct Soft-Policy Sampling via Langevin Dynamics
Donghyeon Ki, Hee-Jun Ahn, Kyungyoon Kim, Byung-Jun Lee

TL;DR
This paper introduces Noise-Conditioned Langevin Q-Learning (NC-LQL), a novel method for directly sampling soft policies in reinforcement learning using Langevin dynamics with multi-scale noise, improving exploration and performance.
Contribution
The paper proposes NC-LQL, which integrates multi-scale noise into Langevin dynamics for efficient soft-policy sampling without explicit policy parameterization.
Findings
NC-LQL achieves competitive results on MuJoCo benchmarks.
It enables effective exploration through multi-scale noise perturbations.
The method simplifies soft-policy sampling in reinforcement learning.
Abstract
Soft policies in reinforcement learning define policies as Boltzmann distributions over state-action value functions, providing a principled mechanism for balancing exploration and exploitation. However, realizing such soft policies in practice remains challenging. Existing approaches either depend on parametric policies with limited expressivity or employ diffusion-based policies whose intractable likelihoods hinder reliable entropy estimation in soft policy objectives. We address this challenge by directly realizing soft-policy sampling via Langevin dynamics driven by the action gradient of the Q-function. This perspective leads to Langevin Q-Learning (LQL), which samples actions from the target Boltzmann distribution without explicitly parameterizing the policy. However, directly applying Langevin dynamics suffers from slow mixing in high-dimensional and non-convex Q-landscapes,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
