VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings

Athanasios Efthymiou; Stevan Rudinac; Monika Kackovic; Nachoem Wijnberg; Marcel Worring

arXiv:2603.02435·cs.AI·March 16, 2026

VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings

Athanasios Efthymiou, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring

PDF

Open Access

TL;DR

VL-KGE leverages vision-language models to create unified, multimodal embeddings for knowledge graphs, significantly improving link prediction by better aligning diverse modalities.

Contribution

This paper introduces VL-KGE, a novel framework that combines VLMs with structured relational modeling for enhanced multimodal knowledge graph embeddings.

Findings

01

VL-KGE outperforms traditional KGE methods in link prediction.

02

Experiments on WN9-IMG and new art MKGs show improved multimodal reasoning.

03

VLMs effectively align diverse modalities within knowledge graphs.

Abstract

Real-world multimodal knowledge graphs (MKGs) are inherently heterogeneous, modeling entities that are associated with diverse modalities. Traditional knowledge graph embedding (KGE) methods excel at learning continuous representations of entities and relations, yet they are typically designed for unimodal settings. Recent approaches extend KGE to multimodal settings but remain constrained, often processing modalities in isolation, resulting in weak cross-modal alignment, and relying on simplistic assumptions such as uniform modality availability across entities. Vision-Language Models (VLMs) offer a powerful way to align diverse modalities within a shared embedding space. We propose Vision-Language Knowledge Graph Embeddings (VL-KGE), a framework that integrates cross-modal alignment from VLMs with structured relational modeling to learn unified multimodal representations of knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Multimodal Machine Learning Applications · Topic Modeling