Hyperbolic Fine-Tuning for Large Language Models
Menglin Yang, Ram Samarth B B, Aosong Feng, Bo Xiong, Jihong Liu, Irwin King, Rex Ying

TL;DR
This paper explores the hyperbolic geometry of language model embeddings, revealing hierarchical structures, and introduces HypLoRA, a hyperbolic fine-tuning method that enhances large language model performance on reasoning tasks.
Contribution
The paper uncovers hyperbolic properties in LLM embeddings and proposes HypLoRA, a novel hyperbolic fine-tuning approach that leverages these structures for improved performance.
Findings
Hyperbolic characteristics are present in token embeddings.
HypLoRA outperforms traditional fine-tuning methods.
Performance improvements on reasoning benchmarks.
Abstract
Large language models (LLMs) have demonstrated remarkable performance across various tasks. However, it remains an open question whether the default Euclidean space is the most suitable choice for LLMs. In this study, we investigate the geometric characteristics of LLMs, focusing specifically on tokens and their embeddings. Our findings reveal that token frequency follows a power-law distribution, where high-frequency tokens (e.g., the, that ) constitute the minority, while low-frequency tokens (e.g., apple, dog) constitute the majority. Furthermore, high-frequency tokens cluster near the origin, whereas low-frequency tokens are positioned farther away in the embedding space. Additionally, token embeddings exhibit hyperbolic characteristics, indicating a latent tree-like structure within the embedding space. Motivated by these observations, we propose HypLoRA, an efficient fine-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
