Understanding and Improving Hyperbolic Deep Reinforcement Learning

Timo Klein; Thomas Lang; Andrii Shkabrii; Alexander Sturm; Kevin Sidak; Lukas Miklautz; Claudia Plant; Yllka Velaj; Sebastian Tschiatschek

arXiv:2512.14202·cs.LG·March 9, 2026

Understanding and Improving Hyperbolic Deep Reinforcement Learning

Timo Klein, Thomas Lang, Andrii Shkabrii, Alexander Sturm, Kevin Sidak, Lukas Miklautz, Claudia Plant, Yllka Velaj, Sebastian Tschiatschek

PDF

Open Access 3 Reviews

TL;DR

This paper investigates the challenges of hyperbolic deep reinforcement learning, analyzes gradient instability issues, and introduces Hyper++, a new method that improves stability and performance in hyperbolic RL tasks.

Contribution

We identify key factors causing training failures in hyperbolic deep RL and propose Hyper++, a novel approach with regularization and stable training techniques.

Findings

01

Hyper++ achieves more stable learning on ProcGen and Atari-5.

02

It outperforms prior hyperbolic RL agents in efficiency and effectiveness.

03

Reduces wall-clock time by approximately 30%.

Abstract

The exponential volume growth of hyperbolic geometry can embed the hierarchical relationships between states in reinforcement learning (RL) with far less distortion than Euclidean space. However, hyperbolic deep RL faces severe optimization challenges, and formal analysis of why optimization fails is lacking. We identify key factors that determine the success and failure of training hyperbolic deep RL agents. By analyzing the gradients of core operations in the Poincar\'e Ball and Hyperboloid models of hyperbolic geometry, we show that large-norm embeddings destabilize gradient-based training, leading to trust-region violations in proximal policy optimization (PPO). Based on these insights, we introduce Hyper++, a new hyperbolic deep RL agent that consists of three components: (1) feature regularization guaranteeing bounded norms while avoiding the curse of dimensionality from clipping;…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 2

Strengths

**Clear diagnosis of instability.** The chain-rule decomposition (Eq. 3) and analysis of the conformal-factor gradient (Eq. 4) and HNN++ MLR derivative (Eq. 5) convincingly explain Poincaré instability; the Hyperboloid exponential Jacobian (Eq. 6) highlights a distinct failure mode—large Euclidean norms—justifying feature-norm control. RMSNorm placed only at the last Euclidean layer, plus a learnable radius cap, maintains capacity without per-layer SpectralNorm overhead; the bound in Prop. 4.2 t

Weaknesses

**Breadth of algorithms.** Off-policy validation is limited to DDQN. I think modern strong baselines (e.g., SAC, DrQ-v2) would better test generality. Ablations on Euclidean + categorical critic and Euclidean + RMSNorm controls would isolate which pieces help independent of hyperbolic geometry. **Scope of environments.** ProcGen is appropriate, but analysis claims about hierarchy/tree-like structure would be stronger with goal-conditioned or hierarchical RL tasks (MiniGrid, Crafter, options) an

Reviewer 02Rating 4Confidence 2

Strengths

- **Rich theoretical results**. Thorough analysis of the feature norms and gradients for hyperboloid MLR seems to be a contribution by itself. It also gives enough justification to the components such as RMSNorm and Hyperboloid model, which are also verified in their ablation studies. - **Careful implementation details**. When replacing SpectralNorm with RMSNorm, authors also point out the exponential shrinkage of the hyperbolic space after normalization and proposes to rescale the features acco

Weaknesses

- **Lack of motivation for HL-Gauss**. Although all other components seem to be well-motivated, there are no sufficient justifications or observations that lead to the addition of categorical loss. Indeed, the benefits of categorical loss is not limited to hyperbolic DRL but for most DRL methods in general [1,2,3]. - **Limited evaluation setup**. The proposed method is verified in 4 ProcGen and 3 Atari environments, which is significantly smaller compared to the prior work by [4]. I suggest expa

Reviewer 03Rating 4Confidence 3

Strengths

- The paper is overall well-written and combines interesting ideas - There is a quite strong theoretical section

Weaknesses

None of the idea alone is truly novel and only the combination of several ideas. I therefore see the paper as largely empirical in scope. There is a 30% improvement on procgen as well as small improvements in 3 Atari games. However, the experimental setup is not fully convincing: - relatively limited scope - the interpretation of the improvements in practice are not fully clear

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Model Reduction and Neural Networks