Direct Soft-Policy Sampling via Langevin Dynamics

Donghyeon Ki; Hee-Jun Ahn; Kyungyoon Kim; Byung-Jun Lee

arXiv:2602.07873·cs.LG·February 10, 2026

Direct Soft-Policy Sampling via Langevin Dynamics

Donghyeon Ki, Hee-Jun Ahn, Kyungyoon Kim, Byung-Jun Lee

PDF

Open Access

TL;DR

This paper introduces Noise-Conditioned Langevin Q-Learning (NC-LQL), a novel method for directly sampling soft policies in reinforcement learning using Langevin dynamics with multi-scale noise, improving exploration and performance.

Contribution

The paper proposes NC-LQL, which integrates multi-scale noise into Langevin dynamics for efficient soft-policy sampling without explicit policy parameterization.

Findings

01

NC-LQL achieves competitive results on MuJoCo benchmarks.

02

It enables effective exploration through multi-scale noise perturbations.

03

The method simplifies soft-policy sampling in reinforcement learning.

Abstract

Soft policies in reinforcement learning define policies as Boltzmann distributions over state-action value functions, providing a principled mechanism for balancing exploration and exploitation. However, realizing such soft policies in practice remains challenging. Existing approaches either depend on parametric policies with limited expressivity or employ diffusion-based policies whose intractable likelihoods hinder reliable entropy estimation in soft policy objectives. We address this challenge by directly realizing soft-policy sampling via Langevin dynamics driven by the action gradient of the Q-function. This perspective leads to Langevin Q-Learning (LQL), which samples actions from the target Boltzmann distribution without explicitly parameterizing the policy. However, directly applying Langevin dynamics suffers from slow mixing in high-dimensional and non-convex Q-landscapes,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning