HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts
Neil He, Rishabh Anand, Hiren Madhu, Ali Maatouk, Smita Krishnaswamy, Leandros Tassiulas, Menglin Yang, Rex Ying

TL;DR
This paper introduces HELM, a family of hyperbolic large language models that operate in non-Euclidean space to better capture language's hierarchical and geometric structure, achieving improved performance over Euclidean models.
Contribution
It presents the first fully hyperbolic LLMs at billion-parameter scale, including a Mixture-of-Curvature Experts model, with new hyperbolic operations and training techniques for enhanced reasoning.
Findings
Achieved up to 4% performance improvement on benchmarks
Demonstrated the effectiveness of hyperbolic geometry in LLMs
First to train large-scale hyperbolic LLMs
Abstract
Large language models (LLMs) have shown great success in text modeling tasks across domains. However, natural language exhibits inherent semantic hierarchies and nuanced geometric structure, which current LLMs do not capture completely owing to their reliance on Euclidean operations. Recent studies have also shown that not respecting the geometry of token embeddings leads to training instabilities and degradation of generative capabilities. These findings suggest that shifting to non-Euclidean geometries can better align language models with the underlying geometry of text. We thus propose to operate fully in Hyperbolic space, known for its expansive, scale-free, and low-distortion properties. We thus introduce HELM, a family of HypErbolic Large Language Models, offering a geometric rethinking of the Transformer-based LLM that addresses the representational inflexibility, missing set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsSoftmax · Attention Is All You Need · LLaMA · ALIGN · Sparse Evolutionary Training
