Hyperbolic Aware Minimization: Implicit Bias for Sparsity
Tom Jacobs, Advait Gadhikar, Celia Rubio-Madrigal, and Rebekka Burkholz

TL;DR
This paper introduces Hyperbolic Aware Minimization (HAM), a new optimization technique that leverages hyperbolic geometry to promote sparsity and improve generalization in deep models, especially when combined with sparsification methods.
Contribution
HAM is a novel optimization method that alternates standard steps with hyperbolic mirror steps, overcoming the small-inverse-metric issue and enhancing model sparsity and performance.
Findings
HAM improves performance in underdetermined linear regression.
HAM enhances sparsity when combined with sparsification methods.
Experiments show HAM's effectiveness on standard vision benchmarks.
Abstract
Understanding the implicit bias of optimization algorithms is key to explaining and improving the generalization of deep models. The hyperbolic implicit bias induced by pointwise overparameterization promotes sparsity, but also yields a small inverse Riemannian metric near zero, slowing down parameter movement and impeding meaningful parameter sign flips. To overcome this obstacle, we propose Hyperbolic Aware Minimization (HAM), which alternates a standard optimizer step with a lightweight hyperbolic mirror step. The mirror step incurs less compute and memory than pointwise overparameterization, reproduces its beneficial hyperbolic geometry for feature learning, and mitigates the small-inverse-metric bottleneck. Our characterization of the implicit bias in the context of underdetermined linear regression provides insights into the mechanism how HAM consistently increases performance…
Peer Reviews
Decision·ICLR 2026 Poster
- Clear motivation and presentation - The method is compatible with existing optimizers (such as SAM), and requires little computational overhead - Strong empirical results - with the method surprisingly also improving dense training - Thorough theoretical analysis
- The method additionally adds two hyperparameters ($\alpha$, $\beta$). Based on figure 8 and 9 in the appendix, the performance of HAM is quite sensitive of these hyperparameters, and optimal hyperparameters seems to differ between models, making it difficult to tune - It seems that although HAM is motivated from the perspective of sparse training, it doesn't actually improve sparse training directly without pairing it with another method (remark 4.7 and 4.8). Given this, the motivation behind
This paper proposes an interesting optimizer modification that attempts to reap the benefits of "point-wise" overparameterization via parameter-wise multiplication without introducing a memory overhead. The sign-adjustment using the previous iterate in the hyperbolic step seems to be novel, and has an interesting interpretation via Riemannian gradient flow. The experiments seem to demonstrate that HAM provides modest improvements across many scenarios as a drop-in adjustment, which seems to impl
Overall, to me it is not made clear in Sections 3 and 4 the main motivating purpose of HAM. In particular, it is claimed via a Riemannian gradient flow analysis that HAM converges faster than gradient flow, but the main numerical results do not seem to focus on this. It is then claimed that HAM encourages a particular kind of implicit sparsity bias, but it is noted by the authors in Remark 4.7 that this is relatively weak alone. As such, it would be good to clarify the core motivation of HAM, an
1. The paper is well-written and clearly identifies a gap in the existing literature. 2. The theoretical contribution is clear. 3. The experiments appear to support the claims.
While I did not see any obvious weaknesses, I also cannot strongly endorse this paper as I am an outsider to this area.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBehavioral and Psychological Studies
