gpuRDF2vec -- Scalable GPU-based RDF2vec
Martin B\"ockling, Heiko Paulheim

TL;DR
gpuRDF2vec is a GPU-accelerated library that significantly speeds up the process of generating large-scale RDF2vec knowledge graph embeddings, enabling practical use on web-scale data.
Contribution
It introduces gpuRDF2vec, a scalable GPU-based implementation supporting multi-node execution, outperforming existing tools in speed and scalability for RDF2vec embedding generation.
Findings
Achieves up to substantial speedup over jRDF2vec.
Single-node walk extraction outperforms pyRDF2vec, SparkKGML, and jRDF2vec.
Scales well to longer walks for better embedding quality.
Abstract
Generating Knowledge Graph (KG) embeddings at web scale remains challenging. Among existing techniques, RDF2vec combines effectiveness with strong scalability. We present gpuRDF2vec, an open source library that harnesses modern GPUs and supports multi-node execution to accelerate every stage of the RDF2vec pipeline. Extensive experiments on both synthetically generated graphs and real-world benchmarks show that gpuRDF2vec achieves up to a substantial speedup over the currently fastest alternative, i.e., jRDF2vec. In a single-node setup, our walk-extraction phase alone outperforms pyRDF2vec, SparkKGML, and jRDF2vec by a substantial margin using random walks on large/ dense graphs, and scales very well to longer walks, which typically lead to better quality embeddings. Our implementation of gpuRDF2vec enables practitioners and researchers to train high-quality KG embeddings on large-scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Data Quality and Management
