Column-Oriented Datalog on the GPU

Yihao Sun; Sidharth Kumar; Thomas Gilray; Kristopher Micinski

arXiv:2501.13051·cs.DB·January 23, 2025

Column-Oriented Datalog on the GPU

Yihao Sun, Sidharth Kumar, Thomas Gilray, Kristopher Micinski

PDF

Open Access 1 Video

TL;DR

This paper introduces VFLog, a column-oriented Datalog engine optimized for GPUs, achieving significant performance improvements over CPU-based and existing GPU Datalog engines in knowledge representation and reasoning tasks.

Contribution

It presents the first GPU-optimized column-oriented Datalog engine, leveraging modern GPU capabilities for enhanced performance.

Findings

01

Over 200x faster than state-of-the-art CPU engines

02

2.5x faster than existing GPU Datalog engines

03

Effective for knowledge representation and reasoning workloads

Abstract

Datalog is a logic programming language widely used in knowledge representation and reasoning (KRR), program analysis, and social media mining due to its expressiveness and high performance. Traditionally, Datalog engines use either row-oriented or column-oriented storage. Engines like VLog and Nemo favor column-oriented storage for efficiency on limited-resource machines, while row-oriented engines like Souffle use advanced data structures with locking to perform better on multi-core CPUs. The advent of modern datacenter GPUs, such as the NVIDIA H100 with its ability to run over 16k threads simultaneously and high memory bandwidth, has reopened the debate on which storage layout is more effective. This paper presents the first column-oriented Datalog engines tailored to the strengths of modern GPUs. We present VFLog, a CUDA-based Datalog runtime library with a column-oriented GPU…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Column-Oriented Datalog on the GPU· underline

Taxonomy

TopicsDistributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques