Generative Actor-Critic with Soft Bridge Policies

Ke He; Le He; Shunpu Tang; Yafei Wang; Lisheng Fan

arXiv:2605.08733·cs.LG·May 12, 2026

Generative Actor-Critic with Soft Bridge Policies

Ke He, Le He, Shunpu Tang, Yafei Wang, Lisheng Fan

PDF

TL;DR

This paper introduces SoftGAC, a new generative actor-critic method that enables efficient, tractable MaxEnt reinforcement learning with expressive policies, achieving high performance with low inference cost.

Contribution

SoftGAC proposes a structured bridge policy that simplifies MaxEnt objectives, allowing single-pass action generation and improved efficiency over existing generative policy methods.

Findings

01

SoftGAC outperforms or matches strong baselines on continuous control tasks.

02

It maintains low-latency, single-pass actor inference.

03

It achieves better compute-return tradeoffs compared to diffusion and flow policies.

Abstract

Expressive generative policies such as diffusion and flow models are appealing for MaxEnt online reinforcement learning because of their ability to model multimodal and highly non-Gaussian action distributions. However, training effective soft generative policies faces two obstacles that often arise together. First, marginal action densities are often unavailable, so existing methods typically rely on entropy bounds, heuristic proxies or approximations. Second, iterative shared-parameter samplers raise inference cost and require backpropagation through time over repeated network evaluations, increasing memory cost and destabilizing policy optimization. These obstacles motivate us to seek a generative policy that exposes a tractable MaxEnt objective while requiring only a single sampled actor forward pass for action generation. To this end, we propose soft generative actor-critic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.