Contrastive Language-Image Pre-Training with Knowledge Graphs

Xuran Pan; Tianzhu Ye; Dongchen Han; Shiji Song; Gao Huang

arXiv:2210.08901·cs.CV·October 18, 2022·23 cites

Contrastive Language-Image Pre-Training with Knowledge Graphs

Xuran Pan, Tianzhu Ye, Dongchen Han, Shiji Song, Gao Huang

PDF

Open Access 1 Video

TL;DR

This paper introduces Knowledge-CLIP, a pre-training framework that incorporates semantic knowledge graphs into vision-language models to improve semantic alignment and reasoning across modalities.

Contribution

It presents a novel knowledge-based pre-training approach that enhances CLIP by injecting semantic information from knowledge graphs, improving cross-modal understanding.

Findings

01

Outperforms original CLIP on multiple vision-language tasks.

02

Enhances semantic alignment and reasoning capabilities.

03

Demonstrates effectiveness of knowledge integration in pre-training.

Abstract

Recent years have witnessed the fast development of large-scale pre-training frameworks that can extract multi-modal representations in a unified form and achieve promising performances when transferred to downstream tasks. Nevertheless, existing approaches mainly focus on pre-training with simple image-text pairs, while neglecting the semantic connections between concepts from different modalities. In this paper, we propose a knowledge-based pre-training framework, dubbed Knowledge-CLIP, which injects semantic information into the widely used CLIP model. Through introducing knowledge-based objectives in the pre-training process and utilizing different types of knowledge graphs as training data, our model can semantically align the representations in vision and language with higher quality, and enhance the reasoning ability across scenarios and modalities. Extensive experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Contrastive Language-Image Pre-Training with Knowledge Graphs· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning

MethodsContrastive Language-Image Pre-training · ALIGN