Better with Less: A Data-Active Perspective on Pre-Training Graph Neural   Networks

Jiarong Xu; Renhong Huang; Xin Jiang; Yuxuan Cao; Carl Yang; Chunping; Wang; Yang Yang

arXiv:2311.01038·cs.LG·November 22, 2023·5 cites

Better with Less: A Data-Active Perspective on Pre-Training Graph Neural Networks

Jiarong Xu, Renhong Huang, Xin Jiang, Yuxuan Cao, Carl Yang, Chunping, Wang, Yang Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a data-active graph pre-training framework that selects the most representative data points to improve GNN pre-training efficiency and downstream performance, challenging the notion that more data always yields better results.

Contribution

It proposes the APT framework, which uses a graph selector and predictive uncertainty to choose optimal data, leading to more effective pre-training with less data.

Findings

01

Achieves better downstream performance with fewer training samples.

02

Demonstrates the effectiveness of data selection over data quantity.

03

Provides a progressive, iterative pre-training process.

Abstract

Pre-training on graph neural networks (GNNs) aims to learn transferable knowledge for downstream tasks with unlabeled data, and it has recently become an active research area. The success of graph pre-training models is often attributed to the massive amount of input data. In this paper, however, we identify the curse of big data phenomenon in graph pre-training: more training data do not necessarily lead to better downstream performance. Motivated by this observation, we propose a better-with-less framework for graph pre-training: fewer, but carefully chosen data are fed into a GNN model to enhance pre-training. The proposed pre-training pipeline is called the data-active graph pre-training (APT) framework, and is composed of a graph selector and a pre-training model. The graph selector chooses the most representative and instructive data points based on the inherent properties of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

galina0217/apt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Topic Modeling · Machine Learning in Materials Science