I/O-Efficient Generation of Massive Graphs Following the LFR Benchmark
Michael Hamann, Ulrich Meyer, Manuel Penschuck, Hung Tran, Dorothea, Wagner

TL;DR
This paper introduces EM-LFR, an external memory algorithm capable of efficiently generating massive LFR benchmark graphs with over 37 billion edges, outperforming existing internal memory and distributed methods.
Contribution
The paper presents novel I/O-efficient external memory algorithms EM-HH, EM-ES, and EM-CM/ES for scalable generation of complex networks following the LFR benchmark, enabling handling of extremely large graphs.
Findings
Able to generate graphs with over 37 billion edges on a single machine
Faster than state-of-the-art internal memory implementations
Competitive with distributed algorithms in performance
Abstract
LFR is a popular benchmark graph generator used to evaluate community detection algorithms. We present EM-LFR, the first external memory algorithm able to generate massive complex networks following the LFR benchmark. Its most expensive component is the generation of random graphs with prescribed degree sequences which can be divided into two steps: the graphs are first materialized deterministically using the Havel-Hakimi algorithm, and then randomized. Our main contributions are EM-HH and EM-ES, two I/O-efficient external memory algorithms for these two steps. We also propose EM-CM/ES, an alternative sampling scheme using the Configuration Model and rewiring steps to obtain a random simple graph. In an experimental evaluation we demonstrate their performance; our implementation is able to handle graphs with more than 37 billion edges on a single machine, is competitive with a massive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
