Latent Spherical Flow Policy for Reinforcement Learning with Combinatorial Actions

Lingkai Kong; Anagha Satish; Hezi Jiang; Akseli Kangaslahti; Andrew Ma; Wenbo Chen; Mingxiao Song; Lily Xu; Milind Tambe

arXiv:2601.22211·cs.LG·February 2, 2026

Latent Spherical Flow Policy for Reinforcement Learning with Combinatorial Actions

Lingkai Kong, Anagha Satish, Hezi Jiang, Akseli Kangaslahti, Andrew Ma, Wenbo Chen, Mingxiao Song, Lily Xu, Milind Tambe

PDF

Open Access

TL;DR

This paper introduces LSFlow, a novel reinforcement learning policy that uses a latent spherical flow to generate feasible combinatorial actions, improving expressiveness and efficiency in complex decision-making tasks.

Contribution

We propose a solver-induced latent spherical flow policy (LSFlow) that combines generative policy expressiveness with feasibility guarantees for combinatorial RL.

Findings

01

Outperforms state-of-the-art baselines by 20.6% on average

02

Learns a stochastic policy in a compact latent space

03

Uses a smoothed Bellman operator for stable learning

Abstract

Reinforcement learning (RL) with combinatorial action spaces remains challenging because feasible action sets are exponentially large and governed by complex feasibility constraints, making direct policy parameterization impractical. Existing approaches embed task-specific value functions into constrained optimization programs or learn deterministic structured policies, sacrificing generality and policy expressiveness. We propose a solver-induced \emph{latent spherical flow policy} that brings the expressiveness of modern generative policies to combinatorial RL while guaranteeing feasibility by design. Our method, LSFlow, learns a \emph{stochastic} policy in a compact continuous latent space via spherical flow matching, and delegates feasibility to a combinatorial optimization solver that maps each latent sample to a valid structured action. To improve efficiency, we train the value…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis