Beyond Distributions: Geometric Action Control for Continuous Reinforcement Learning

Zhihao Lin

arXiv:2511.08234·cs.AI·January 30, 2026

Beyond Distributions: Geometric Action Control for Continuous Reinforcement Learning

Zhihao Lin

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Geometric Action Control (GAC), a new method for continuous reinforcement learning that preserves the geometry of action spaces, simplifies computation, and outperforms existing approaches on multiple benchmarks.

Contribution

GAC offers a geometrically grounded, computationally efficient alternative to Gaussian and vMF policies, improving performance and simplicity in continuous RL control tasks.

Findings

01

GAC matches or exceeds state-of-the-art performance on MuJoCo benchmarks.

02

GAC reduces parameter count and computational complexity compared to vMF distributions.

03

Ablation studies confirm the importance of spherical normalization and adaptive concentration control.

Abstract

Gaussian policies have dominated continuous control in deep reinforcement learning (RL), yet they suffer from a fundamental mismatch: their unbounded support requires ad-hoc squashing functions that distort the geometry of bounded action spaces. While von Mises-Fisher (vMF) distributions offer a theoretically grounded alternative on the sphere, their reliance on Bessel functions and rejection sampling hinders practical adoption. We propose \textbf{Geometric Action Control (GAC)}, a novel action generation paradigm that preserves the geometric benefits of spherical distributions while \textit{simplifying computation}. GAC decomposes action generation into a direction vector and a learnable concentration parameter, enabling efficient interpolation between deterministic actions and uniform spherical noise. This design reduces parameter count from \(2d\) to \(d+1\), and avoids the \(O(dk)\)…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 5

Strengths

The motivation of this work is novel and interesting, which could be an important work in the RL community. GAC represents policies through two components: a direction network that outputs unit vectors indicating preferred action orientations, and a concentration network that controls exploration by interpolating between deterministic directions and uniform spher- ical noise. It takes a novel perspective on whether the distribution paradigm itself is necessary, which also promotes efficient expl

Weaknesses

More benchmarks may be needed to test and provide solid experimental results. Regarding the policy distribution, I recommend that the authors add the discretization policy distribution topic works. Discretizing continuous action space for on-policy optimization, AAAI, 2020 Discretizing Continuous Action Space With Unimodal Probability Distributions for On-Policy Reinforcement Learning, IEEE TNNLS, 2024.

Reviewer 02Rating 6Confidence 3

Strengths

The paper provides a strong motivation and a clear presentation of the algorithm. It also delivers a comprehensive experimental analysis, featuring multiple baseline comparisons, thorough ablation studies, and an in-depth examination of convergence behavior and the sampling landscape.

Weaknesses

The algorithm introduces an extra hyperparameter, action magnitude, but the robustness analysis across tasks is missing. Though the algorithm is well-motivated, the performance improvement is limited at the cost of an extra hyperparameter.

Reviewer 03Rating 4Confidence 4

Strengths

This paper is overall logically clear, presenting an interesting perspective on the action exploration in continuous control tasks. With this perspective, this paper reformulates the exploration distribution to a unit sphere. Compared to a policy with von Mises-Fisher distributial output, it reduces the sample complexity.

Weaknesses

This paper seems to overstate its contribution by comparing to a distribution that is not commonly used in RL, especially on the point about sample complexity. It is better to have a clearer comparison between the proposed method with deterministic models and Gaussian based models for the "Geometric" feature and sample complexity, respectively. The "Geometric" feature in this work is similar to deterministic models, which is not clearly stated in this paper.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference · Adversarial Robustness in Machine Learning