Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning

F\'elix Lefebvre; Ga\"el Varoquaux

arXiv:2507.00965·cs.LG·March 18, 2026

Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning

F\'elix Lefebvre, Ga\"el Varoquaux

PDF

Open Access 1 Datasets

TL;DR

SEPAL is a scalable algorithm that produces high-quality knowledge graph embeddings for downstream tasks by optimizing a small core and propagating embeddings, outperforming previous methods on large graphs.

Contribution

Introduces SEPAL, a scalable embedding method that ensures global consistency and reduces engineering effort for large knowledge graphs.

Findings

01

SEPAL outperforms previous methods on 46 downstream tasks.

02

SEPAL scales to huge knowledge graphs on commodity hardware.

03

Embedding quality improves with the core-based propagation approach.

Abstract

Many machine learning tasks can benefit from external knowledge. Large knowledge graphs store such knowledge, and embedding methods can be used to distill it into ready-to-use vector representations for downstream applications. For this purpose, current models have however two limitations: they are primarily optimized for link prediction, via local contrastive learning, and their application to the largest graphs requires significant engineering effort due to GPU memory limits. To address these, we introduce SEPAL: a Scalable Embedding Propagation ALgorithm for large knowledge graphs designed to produce high-quality embeddings for downstream tasks at scale. The key idea of SEPAL is to ensure global embedding consistency by optimizing embeddings only on a small core of entities, and then propagating them to the rest of the graph with message passing. We evaluate SEPAL on 7 large-scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

inria-soda/sepal-datasets
dataset· 50 dl
50 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Big Data and Digital Economy