HyperAlign: Hyperbolic Entailment Cones for Adaptive Text-to-Image Alignment Assessment

Wenzhi Chen; Bo Hu; Leida Li; Lihuo He; Wen Lu; Xinbo Gao

arXiv:2601.04614·cs.CV·March 20, 2026

HyperAlign: Hyperbolic Entailment Cones for Adaptive Text-to-Image Alignment Assessment

Wenzhi Chen, Bo Hu, Leida Li, Lihuo He, Wen Lu, Xinbo Gao

PDF

Open Access

TL;DR

HyperAlign introduces a hyperbolic geometric framework for more accurate and adaptive assessment of text-to-image alignment, surpassing Euclidean-based methods in evaluation tasks.

Contribution

It proposes a novel hyperbolic entailment geometry approach with dynamic supervision and adaptive modulation for improved alignment assessment.

Findings

01

Achieves state-of-the-art performance on evaluation benchmarks.

02

Demonstrates strong cross-database generalization.

03

Validates effectiveness of hyperbolic modeling for semantic alignment.

Abstract

With the rapid development of text-to-image generation technology, accurately assessing the alignment between generated images and text prompts has become a critical challenge. Existing methods rely on Euclidean space metrics, neglecting the structured nature of semantic alignment, while lacking adaptive capabilities for different samples. To address these limitations, we propose HyperAlign, an adaptive text-to-image alignment assessment framework based on hyperbolic entailment geometry. First, we extract Euclidean features using CLIP and map them to hyperbolic space. Second, we design a dynamic-supervision entailment modeling mechanism that transforms discrete entailment logic into continuous geometric structure supervision. Finally, we propose an adaptive modulation regressor that utilizes hyperbolic geometric features to generate sample-level modulation parameters, adaptively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Data Visualization and Analytics · Advanced Image and Video Retrieval Techniques