Data-Centric Learning from Unlabeled Graphs with Diffusion Model
Gang Liu, Eric Inae, Tong Zhao, Jiaxin Xu, Tengfei Luo, Meng Jiang

TL;DR
This paper introduces a data-centric approach using diffusion models to leverage unlabeled graphs for property prediction, generating task-specific labeled examples that improve performance over traditional self-supervised methods.
Contribution
The paper proposes a novel diffusion-based method to extract and utilize knowledge from unlabeled graphs by generating labeled graph examples tailored to each prediction task.
Findings
Outperforms 15 existing methods on 15 tasks
Generated labeled examples improve prediction accuracy
Unlabeled data enhances performance beyond self-supervised learning
Abstract
Graph property prediction tasks are important and numerous. While each task offers a small size of labeled examples, unlabeled graphs have been collected from various sources and at a large scale. A conventional approach is training a model with the unlabeled graphs on self-supervised tasks and then fine-tuning the model on the prediction tasks. However, the self-supervised task knowledge could not be aligned or sometimes conflicted with what the predictions needed. In this paper, we propose to extract the knowledge underlying the large set of unlabeled graphs as a specific set of useful data points to augment each property prediction model. We use a diffusion model to fully utilize the unlabeled graphs and design two new objectives to guide the model's denoising process with each task's labeled data to generate task-specific graph examples and their labels. Experiments demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Graph Neural Networks · Topic Modeling · Machine Learning and Data Classification
MethodsDiffusion
