Uncertainty-guided Compositional Alignment with Part-to-Whole Semantic Representativeness in Hyperbolic Vision-Language Models

Hayeon Kim; Ji Ha Jang; Junghun James Kim; Se Young Chun

arXiv:2603.22042·cs.CV·March 25, 2026

Uncertainty-guided Compositional Alignment with Part-to-Whole Semantic Representativeness in Hyperbolic Vision-Language Models

Hayeon Kim, Ji Ha Jang, Junghun James Kim, Se Young Chun

PDF

Open Access 1 Models

TL;DR

This paper introduces UNCHA, a novel hyperbolic vision-language model that uses uncertainty modeling to better capture part-to-whole hierarchical relationships, improving understanding of complex scenes and achieving state-of-the-art results.

Contribution

It proposes a new uncertainty-guided compositional alignment method in hyperbolic VLMs that models part-to-whole semantic representativeness with uncertainty and enhances hierarchical understanding.

Findings

01

Achieves state-of-the-art performance on zero-shot classification.

02

Improves multi-object scene understanding through better part-whole ordering.

03

Enhances hyperbolic embeddings with uncertainty modeling.

Abstract

While Vision-Language Models (VLMs) have achieved remarkable performance, their Euclidean embeddings remain limited in capturing hierarchical relationships such as part-to-whole or parent-child structures, and often face challenges in multi-object compositional scenarios. Hyperbolic VLMs mitigate this issue by better preserving hierarchical structures and modeling part-whole relations (i.e., whole scene and its part images) through entailment. However, existing approaches do not model that each part has a different level of semantic representativeness to the whole. We propose UNcertainty-guided Compositional Hyperbolic Alignment (UNCHA) for enhancing hyperbolic VLMs. UNCHA models part-to-whole semantic representativeness with hyperbolic uncertainty, by assigning lower uncertainty to more representative parts and higher uncertainty to less representative ones for the whole scene. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
hayeonkim/uncha
model· 8 dl
8 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques