HexFormer: Hyperbolic Vision Transformer with Exponential Map Aggregation

Haya Alyoussef; Ahmad Bdeir; Diego Coello de Portugal Mecke; Tom Hanika; Niels Landwehr; Lars Schmidt-Thieme

arXiv:2601.19849·cs.CV·January 28, 2026

HexFormer: Hyperbolic Vision Transformer with Exponential Map Aggregation

Haya Alyoussef, Ahmad Bdeir, Diego Coello de Portugal Mecke, Tom Hanika, Niels Landwehr, Lars Schmidt-Thieme

PDF

Open Access 3 Reviews

TL;DR

HexFormer introduces a hyperbolic vision transformer utilizing exponential map aggregation, which improves accuracy and gradient stability in image classification tasks by effectively modeling hierarchical data structures.

Contribution

This work presents a novel hyperbolic vision transformer with exponential map aggregation, demonstrating improved performance and training stability over Euclidean models.

Findings

01

Consistent performance improvements over Euclidean baselines.

02

Hyperbolic models exhibit more stable gradients.

03

Hybrid variant achieves the strongest results.

Abstract

Data across modalities such as images, text, and graphs often contains hierarchical and relational structures, which are challenging to model within Euclidean geometry. Hyperbolic geometry provides a natural framework for representing such structures. Building on this property, this work introduces HexFormer, a hyperbolic vision transformer for image classification that incorporates exponential map aggregation within its attention mechanism. Two designs are explored: a hyperbolic ViT (HexFormer) and a hybrid variant (HexFormer-Hybrid) that combines a hyperbolic encoder with an Euclidean linear classification head. HexFormer incorporates a novel attention mechanism based on exponential map aggregation, which yields more accurate and stable aggregated representations than standard centroid based averaging, showing that simpler approaches retain competitive merit. Experiments across…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 8Confidence 4

Strengths

The Exponential Map Aggregation (ExpAgg) scheme is a robust, mathematically sound and original. The paper provides strong quantitative and qualitative evidence that hyperbolic representations enhance optimization dynamics. HexFormer exhibits minimal reliance on warmup schedules compared to Euclidean baselines (Table 2) and demonstrates smoother, more consistent gradient distributions (Figure 3). This is a major finding for the practical adoption of hyperbolic models. HexFormer-Tiny (∼3M params

Weaknesses

The HexFormer-Hybrid variant performs best, raising the question of why a strictly Euclidean classification head is superior after a hyperbolic encoder. Providing a training time comparison would also be beneficial. While Table 2 cifar 10 supports the claim, Tiny-Imagenet does not fully show the difference between Euclidean and hyperbolic I would recommend citing hyperbolic learning surveys such as “Hyperbolic Deep Learning in Computer Vision: A Survey” Multiple citations in lines 105-106

Reviewer 02Rating 4Confidence 3

Strengths

The main advantages of this paper are: (1) exploring more forms of hyperbolic-Euclidean hybrid structures; (2) emphasizing the stability and advantages brought by exponential map aggregation

Weaknesses

The paper's experimental validation focuses primarily on small image classification datasets (CIFAR-10, CIFAR-100, Tiny-ImageNet), lacking verification on more challenging downstream tasks such as object detection, segmentation, or real-world scenario data. This limitation raises questions about the model's generalization capabilities and practical deployment value. The paper explicitly acknowledges that comprehensive hyperparameter optimization was conducted only for the ViT-Tiny scale, while

Reviewer 03Rating 4Confidence 3

Strengths

**Novel geometric perspective:** This paper introduce hyperbolic geometry into vision transformers, which is original and conceptually meaningful, extending non-Euclidean representation learning to high-dimensional visual domains. **Theoretical Support:** The mathematical formulation of the hyperbolic projection and attention mechanism is clearly presented and theoretically sound. **Well-organized:** The paper is well organized and easy to follow, with intuitive figures and insightful visualiz

Weaknesses

**Limited experimental scope and generality:** The experiments are limited to image classification tasks on a small number of datasets, which makes the empirical validation narrow and the method’s generality questionable. Since the proposed hyperbolic representation aims to capture hierarchical relationships, demonstrating its benefits on diverse tasks such as detection, segmentation, or retrieval would be crucial for broader applicability. **Ablation:** The empirical study lacks a systematic a

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Advanced Graph Neural Networks