HexFormer: Hyperbolic Vision Transformer with Exponential Map Aggregation
Haya Alyoussef, Ahmad Bdeir, Diego Coello de Portugal Mecke, Tom Hanika, Niels Landwehr, Lars Schmidt-Thieme

TL;DR
HexFormer introduces a hyperbolic vision transformer utilizing exponential map aggregation, which improves accuracy and gradient stability in image classification tasks by effectively modeling hierarchical data structures.
Contribution
This work presents a novel hyperbolic vision transformer with exponential map aggregation, demonstrating improved performance and training stability over Euclidean models.
Findings
Consistent performance improvements over Euclidean baselines.
Hyperbolic models exhibit more stable gradients.
Hybrid variant achieves the strongest results.
Abstract
Data across modalities such as images, text, and graphs often contains hierarchical and relational structures, which are challenging to model within Euclidean geometry. Hyperbolic geometry provides a natural framework for representing such structures. Building on this property, this work introduces HexFormer, a hyperbolic vision transformer for image classification that incorporates exponential map aggregation within its attention mechanism. Two designs are explored: a hyperbolic ViT (HexFormer) and a hybrid variant (HexFormer-Hybrid) that combines a hyperbolic encoder with an Euclidean linear classification head. HexFormer incorporates a novel attention mechanism based on exponential map aggregation, which yields more accurate and stable aggregated representations than standard centroid based averaging, showing that simpler approaches retain competitive merit. Experiments across…
Peer Reviews
Decision·Submitted to ICLR 2026
The Exponential Map Aggregation (ExpAgg) scheme is a robust, mathematically sound and original. The paper provides strong quantitative and qualitative evidence that hyperbolic representations enhance optimization dynamics. HexFormer exhibits minimal reliance on warmup schedules compared to Euclidean baselines (Table 2) and demonstrates smoother, more consistent gradient distributions (Figure 3). This is a major finding for the practical adoption of hyperbolic models. HexFormer-Tiny (∼3M params
The HexFormer-Hybrid variant performs best, raising the question of why a strictly Euclidean classification head is superior after a hyperbolic encoder. Providing a training time comparison would also be beneficial. While Table 2 cifar 10 supports the claim, Tiny-Imagenet does not fully show the difference between Euclidean and hyperbolic I would recommend citing hyperbolic learning surveys such as “Hyperbolic Deep Learning in Computer Vision: A Survey” Multiple citations in lines 105-106
The main advantages of this paper are: (1) exploring more forms of hyperbolic-Euclidean hybrid structures; (2) emphasizing the stability and advantages brought by exponential map aggregation
The paper's experimental validation focuses primarily on small image classification datasets (CIFAR-10, CIFAR-100, Tiny-ImageNet), lacking verification on more challenging downstream tasks such as object detection, segmentation, or real-world scenario data. This limitation raises questions about the model's generalization capabilities and practical deployment value. The paper explicitly acknowledges that comprehensive hyperparameter optimization was conducted only for the ViT-Tiny scale, while
**Novel geometric perspective:** This paper introduce hyperbolic geometry into vision transformers, which is original and conceptually meaningful, extending non-Euclidean representation learning to high-dimensional visual domains. **Theoretical Support:** The mathematical formulation of the hyperbolic projection and attention mechanism is clearly presented and theoretically sound. **Well-organized:** The paper is well organized and easy to follow, with intuitive figures and insightful visualiz
**Limited experimental scope and generality:** The experiments are limited to image classification tasks on a small number of datasets, which makes the empirical validation narrow and the method’s generality questionable. Since the proposed hyperbolic representation aims to capture hierarchical relationships, demonstrating its benefits on diverse tasks such as detection, segmentation, or retrieval would be crucial for broader applicability. **Ablation:** The empirical study lacks a systematic a
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Advanced Graph Neural Networks
