Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces

Saket Tiwari; Omer Gottesman; George Konidaris

arXiv:2507.20853·cs.LG·July 29, 2025

Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces

Saket Tiwari, Omer Gottesman, George Konidaris

PDF

1 Video 3 Reviews

TL;DR

This paper develops a geometric theoretical framework for understanding how neural reinforcement learning policies in continuous spaces induce low-dimensional manifolds of attainable states, linking manifold dimensionality to action space size.

Contribution

It introduces the first geometric analysis connecting the manifold of attainable states to the action space dimensionality in neural RL, supported by empirical validation.

Findings

01

The attainable state set forms a low-dimensional manifold.

02

The manifold's dimensionality is proportional to the action space dimension.

03

Empirical results validate the theoretical upper bound in MuJoCo environments.

Abstract

Advances in reinforcement learning (RL) have led to its successful application in complex tasks with continuous state and action spaces. Despite these advances in practice, most theoretical work pertains to finite state and action spaces. We propose building a theoretical understanding of continuous state and action spaces by employing a geometric lens to understand the locally attained set of states. The set of all parametrised policies learnt through a semi-gradient based approach induces a set of attainable states in RL. We show that the training dynamics of a two-layer neural policy induce a low dimensional manifold of attainable states embedded in the high-dimensional nominal state space trained using an actor-critic algorithm. We prove that, under certain conditions, the dimensionality of this manifold is of the order of the dimensionality of the action space. This is the first…

Peer Reviews

Decision·ICLR 2025 Oral

Reviewer 01Rating 8Confidence 2

Strengths

The paper is well-written and rigorous. The analysis of the manifold hypothesis for RL using neural network policies is novel to my knowledge, and a significant contribution to the area.

Weaknesses

I don’t have any major comments on the weaknesses of the paper, I feel that the authors adequately mention the limitations in Section 6. Minor: - There were a few awkward or incomplete sentences that I did not understand: - L263-265: “In theoretical frameworks ….” - Caption of Figure 3 - L462: “A common of a fully …” - The paragraph under Equation (14), and specifically the sentence in L475-478. - The fonts in Figures 2,3, and 4 were too small to read.

Reviewer 02Rating 8Confidence 2

Strengths

- Strong theoretical insights: The authors look at the geometry of RL dynamics in continuous-time MDP with continuous spaces in order to link the dimensionality of the attainable state manifold to the action space. Theorem 1 is the main contribution in this reward. The theorem formally shows that the dimension of this manifold is related to the dimensionality of the agent's action space rather than the full state space. Specifically, the dimension of the manifold is approximately $2 \times (acti

Weaknesses

- The mathematical presentation is too complex: The derivation and presentation of the theoretical framework are mathematically dense, which could limit accessibility for a broader audience. The notation and terminology, especially in the sections on Lie series and vector fields, may be difficult for readers less familiar with differential geometry. It would be great if there could be some better insights following each main step. - There is little discussion on scalability: While the results

Reviewer 03Rating 8Confidence 3

Strengths

By proving that RL agents operate within a low-dimensional manifold of attainable states, the authors offer a mathematically rigorous insight that connects the geometry of state spaces to the action dimensionality. This theoretical framework is supported by empirical evidence from simulated environments, such as MuJoCo, showing that training dynamics in RL indeed produce a low-dimensional representation. The paper presents a practical application by incorporating a manifold-learning layer in po

Weaknesses

The analysis assumes deterministic transitions and access to an exact value function, which is often impractical in dynamic, stochastic environments. The simulation setups may not fully capture the complexity and variability of real-world tasks where environmental noise and high-dimensional data structures can complicate learning dynamics. Lastly, the mathematical framework is limited to two-layer neural networks, which may oversimplify the behaviors of deeper architectures commonly used in mo

Videos

Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces· slideslive