Generating Synthetic Citation Networks with Communities
{\L}ukasz Brzozowski, Marek Gagolewski, Grzegorz Siudem

TL;DR
This paper compares methods for generating synthetic citation networks with community structures, proposing improvements and introducing the Citation Seeder algorithm for realistic and efficient network simulation.
Contribution
It systematically evaluates directed graph generators, proposes reversing edge directions to improve realism, and introduces the Citation Seeder algorithm based on the Price-Pareto model.
Findings
Reversing edge directions improves generator performance.
High-parameter models tend to overfit community statistics.
Citation Seeder achieves competitive results with fewer parameters.
Abstract
Generating realistic synthetic citation, patent, or component dependency networks is essential for benchmarking community detection, graph visualisation, and network data mining algorithms. We present the first systematic comparison of generators of directed graphs that are nearly acyclic and have a ground-truth community structure. We evaluate 12 methods across 7 real citation networks and 26 metrics. We propose the practice of reversing directions of edges in static generators to break cycles and induce a citation-like flow, which significantly improves the performance of a degree-corrected Stochastic Block Model. Our novel methodological approach to evaluating community detection benchmarks distinguishes between endogenous and exogenous mesoscopic similarities, with the latter proving more important. This distinction reveals that high-parameter models suffer from overfitting by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
