Bidirectional Soft Actor-Critic: Leveraging Forward and Reverse KL Divergence for Efficient Reinforcement Learning

Yixian Zhang; Huaze Tang; Changxu Wei; Wenbo Ding

arXiv:2506.01639·cs.LG·June 3, 2025

Bidirectional Soft Actor-Critic: Leveraging Forward and Reverse KL Divergence for Efficient Reinforcement Learning

Yixian Zhang, Huaze Tang, Changxu Wei, Wenbo Ding

PDF

Open Access

TL;DR

This paper introduces Bidirectional SAC, a reinforcement learning algorithm that combines forward and reverse KL divergences to improve policy optimization, resulting in better performance and sample efficiency in continuous control tasks.

Contribution

It proposes a novel bidirectional approach that explicitly leverages forward KL for initialization and reverse KL for refinement, enhancing stability and efficiency.

Findings

01

Achieves up to 30% higher episodic rewards

02

Outperforms standard SAC and baselines

03

Improves sample efficiency significantly

Abstract

The Soft Actor-Critic (SAC) algorithm, a state-of-the-art method in maximum entropy reinforcement learning, traditionally relies on minimizing reverse Kullback-Leibler (KL) divergence for policy updates. However, this approach leads to an intractable optimal projection policy, necessitating gradient-based approximations that can suffer from instability and poor sample efficiency. This paper investigates the alternative use of forward KL divergence within SAC. We demonstrate that for Gaussian policies, forward KL divergence yields an explicit optimal projection policy -- corresponding to the mean and variance of the target Boltzmann distribution's action marginals. Building on the distinct advantages of both KL directions, we propose Bidirectional SAC, an algorithm that first initializes the policy using the explicit forward KL projection and then refines it by optimizing the reverse KL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Memory and Neural Computing