Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral Solutions

Ajinkya Bhole; Mohammad Mahmoudi Filabadi; Guillaume Crevecoeur; Tom Lefebvre

arXiv:2512.06109·math.OC·May 14, 2026

Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral Solutions

Ajinkya Bhole, Mohammad Mahmoudi Filabadi, Guillaume Crevecoeur, Tom Lefebvre

PDF

TL;DR

This paper presents a unified KL-regularized framework for various optimal control problems, connecting classical and soft-policy formulations through iterated solutions and path integral methods.

Contribution

It introduces a generalized control formulation with independent KL penalties, unifies several control problems, and extends computational advantages to a broader class.

Findings

01

Unified KL-regularized control framework encompassing SOC and RSOC.

02

Iterated solutions recover original control objectives.

03

Path integral solutions and linear Bellman operators for synchronized KL weights.

Abstract

This paper develops a unified perspective on several optimal control formulations through the lens of Kullback-Leibler (KL) regularization. We propose a central problem that separates the KL penalties on policies and transitions with independent weights, thus generalizing the standard trajectory-level KL-regularization used in probabilistic optimal control. This umbrella formulation recovers various control problems: the classical Stochastic Optimal Control (SOC), Risk-Sensitive Stochastic Optimal Control (RSOC), and their policy-based KL-regularized counterparts, termed soft-policy SOC and RSOC, which yield tractable surrogates. Beyond being regularized variants, these soft-policy formulations majorize the original SOC and RSOC, thus, iterating their solutions recovers the original objectives. We further identify a synchronized case of soft-policy RSOC where the policy and transition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.