Global and Local Entailment Learning for Natural World Imagery

Srikumar Sastry; Aayush Dhakal; Eric Xing; Subash Khanal; Nathan Jacobs

arXiv:2506.21476·cs.CV·June 27, 2025

Global and Local Entailment Learning for Natural World Imagery

Srikumar Sastry, Aayush Dhakal, Eric Xing, Subash Khanal, Nathan Jacobs

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces Radial Cross-Modal Embeddings (RCME), a novel framework that explicitly models transitive entailment in vision-language models, improving hierarchical understanding and classification performance.

Contribution

The paper proposes RCME, a new framework for explicit transitivity modeling in entailment, enabling hierarchical concept representation in vision-language models.

Findings

01

Enhanced hierarchical classification accuracy

02

Improved retrieval performance on hierarchical tasks

03

Open-source code and models available

Abstract

Learning the hierarchical structure of data in vision-language models is a significant challenge. Previous works have attempted to address this challenge by employing entailment learning. However, these approaches fail to model the transitive nature of entailment explicitly, which establishes the relationship between order and semantics within a representation space. In this work, we introduce Radial Cross-Modal Embeddings (RCME), a framework that enables the explicit modeling of transitivity-enforced entailment. Our proposed framework optimizes for the partial order of concepts within vision-language models. By leveraging our framework, we develop a hierarchical vision-language foundation model capable of representing the hierarchy in the Tree of Life. Our experiments on hierarchical species classification and hierarchical retrieval tasks demonstrate the enhanced performance of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mvrl/RCME
pytorch

Models

🤗
MVRL/rcme-tol-vit-base-patch16
model· 21 dl
21 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications