GraphMaker: Can Diffusion Models Generate Large Attributed Graphs?
Mufei Li, Eleonora Krea\v{c}i\'c, Vamsi K. Potluru, Pan Li

TL;DR
GraphMaker is a novel diffusion model that effectively generates large attributed graphs, capturing complex attribute-structure relationships and enabling data sharing for graph machine learning tasks.
Contribution
The paper introduces GraphMaker, a diffusion-based approach tailored for large attributed graph generation, addressing scalability and complex attribute-structure correlations.
Findings
Synthetic graphs enable competitive ML models without access to original data
Asynchronous generation captures attribute-structure correlations more effectively
Edge mini-batching improves scalability
Abstract
Large-scale graphs with node attributes are increasingly common in various real-world applications. Creating synthetic, attribute-rich graphs that mirror real-world examples is crucial, especially for sharing graph data for analysis and developing learning models when original data is restricted to be shared. Traditional graph generation methods are limited in their capacity to handle these complex structures. Recent advances in diffusion models have shown potential in generating graph structures without attributes and smaller molecular graphs. However, these models face challenges in generating large attributed graphs due to the complex attribute-structure correlations and the large size of these graphs. This paper introduces a novel diffusion model, GraphMaker, specifically designed for generating large attributed graphs. We explore various combinations of node attribute and graph…
Peer Reviews
Decision·Submitted to ICLR 2024
- S1. Generation of large attributed graph presents significant technical challenges that are worth investigation. - S2. Writing is generally clear, although there are some clarity or motivational issues (see W1 and W2). But overall, it is easy to read and follow. - S3. I like the evaluation methodology using ML models, which can be more expressive than traditional statistics, yet is general without requiring domain-specific knowledge.
- W1. Some technical details are not clearly introduced, especially in 3.5. It was only mentioned that "which generates node attributes through an external approach conditioned on node labels." How does this external approach/node label conditioning work exactly? What kind of label is suitable for this purpose? How are these node labels related to Y in 3.2? These are not clearly explained. - W2. Motivation of the decoupled approach is not well articulated, and the choice of the word "decoup
1. A neat figure to describe the method GraphMaker-Sync and GraphMaker-Async, which helps the easier understanding of the method. 2. Presented a method for the generation of large attributed graphs given node labels.
1. The task that predicts the node labels and edge existence given the node attributes is more like a link prediction task, not a graph generation task. Is there any specific reason or reference that defines the task as a graph generation task? 2. What is the novelty of the proposed model? Diffusion-based graph generative models such as GDSS can also deal with attributed graphs. Also, for scalability, simple usage of MPNN for the encoder seems not to be a critical novelty point. 3. Hard to under
The main strength of this paper is that there are few graph generative models that can learn attributed-graph distributions in the large-graph setting. The only existing one (that I know of, which the authors cite) is Yoon et al. 2023 and appears to not actually generate the whole graph, but rather only batch-level training samples of rooted trees on which GNNs can be trained effectively. The fact that GraphMaker can generate a real graph sample with attributes at the same scale as the input dat
The main weakness of the paper is that the empirical results show that the proposed method only marginally outperforms the SBM in the evaluation aspects (graph property, discriminative, benchmarking). While the GraphMaker graphs seem to match graph statistics slightly better (in aggregate), and also better align ranks in benchmarking, the discriminative aspect (Table 2) shows that GNN models trained on synthetic graphs from GraphMaker vs those from SBM do about the same when trained on the sourc
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Graph Neural Networks · Caching and Content Delivery
MethodsDiffusion
