# gLeiden: accelerated community detection algorithms using directed and undirected graphs on GPUs

**Authors:** Beenish Gul, Maria Murach, Stefan Bekarinov, Kevin Skadron

PMC · DOI: 10.1093/bioadv/vbaf327 · Bioinformatics Advances · 2026-01-27

## TL;DR

gLeiden is a fast GPU-based tool for community detection in large biological datasets, offering significant speed improvements over existing methods.

## Contribution

gLeiden is the first GPU-accelerated Leiden algorithm implementation supporting directed graphs and optimized in C++ for high performance.

## Key findings

- gLeiden achieves 11× and 12× speedup over directed cLeiden on large datasets.
- Undirected gLeiden implementations outperform the original Java version by up to 42×.
- gLeiden is 58% faster than cuGraph on large datasets.

## Abstract

Community detection methods are applied to single cell RNA sequencing (i.e. scRNA-seq) and mass cytometry data to efficiently identify major cell types and their subtypes, but their computational demands increase, particularly given the substantial growth in dataset sizes. The Leiden algorithm, an emerging method in this field, offers inherent parallelism that remains underutilized due to the limited parallel processing capabilities offered by today’s modern multi-core CPUs, which have fewer than 100 cores (typically 32–64 CPUs). However, Leiden can achieve significant performance gains when implemented on GPUs. GPUs offer high memory bandwidth and an extensive array of parallel processing units that map well to the parallelism in Leiden. As far as we know, cuGraph is the only implementation that has mapped the Leiden algorithm to GPUs, using a blend of Python and C languages. However, it only supports undirected graphs, potentially discarding the valuable information carried by edge directionality. In addition, this Python implementation for GPUs is comparatively slower than a C/C++ based implementation, reducing the significant performance gains provided by a GPU-based speedup. Conversely, a C/C++ based implementation optimizes performance more effectively, ensuring an accurate baseline comparison when performing GPU acceleration.

We developed a tool named gLeiden, a lightweight CUDA C++ based GPU implementation of the Leiden algorithm and, to the best of our knowledge, the very first GPU implementation that supports directed graphs, which generally demands nearly twice the computational time and memory resources compared to undirected graphs. The results show that our directed gLeiden outperforms the directed cLeiden version and shows 11× and 12× speedup on very large datasets. Our undirected ucLeiden and ugLeiden implementations significantly outperform the original Java version, with up to 42× speedup on large datasets. However, when comparing the undirected ugLeiden version with cuGraph, ugLeiden performance is comparable on smaller datasets and 58% faster on larger datasets. These results position our GPU-based Leiden implementation as a high-performance alternative to existing state-of-the-art community detection tools.

The source code and sample data are available at: https://github.com/Beenishgul/Leiden and https://figshare.com/s/3b51e463a56e2a374bdf

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12987761/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12987761/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/PMC12987761/full.md

---
Source: https://tomesphere.com/paper/PMC12987761