GraphSculptor: Sculpting Pre-training Coreset for Graph Self-supervised Learning

Chuang Liu; Zelin Yao; Xueqi Ma; Luzhi Wang; Mukun Chen; Pinghua Xu; Wenbin Hu

arXiv:2605.01310·cs.LG·May 5, 2026

GraphSculptor: Sculpting Pre-training Coreset for Graph Self-supervised Learning

Chuang Liu, Zelin Yao, Xueqi Ma, Luzhi Wang, Mukun Chen, Pinghua Xu, Wenbin Hu

PDF

TL;DR

GraphSculptor introduces a label-free coreset construction method for graph self-supervised learning, significantly reducing data and computational requirements while maintaining high downstream performance.

Contribution

It proposes a novel, unsupervised coreset construction approach combining structural and semantic diversity, with theoretical guarantees and practical efficiency.

Findings

01

A 10% coreset retains 99.6% of full-data performance.

02

Pre-training time is reduced by nearly 90%.

03

The method outperforms existing approaches in data efficiency.

Abstract

Graph self-supervised learning typically relies on large-scale unlabeled datasets, heavily inflating computational costs. However, empirical evidence suggests that these datasets contain substantial redundancy-our analysis reveals that uniformly subsampling 50% of graphs retains over 96% of downstream performance. To exploit this redundancy, we introduce GraphSculptor for pre-training coreset construction. Unlike methods dependent on additional training-time signals or limited solely to topological statistics, GraphSculptor provides a label-free solution that constructs coresets via two complementary perspectives: intrinsic structure and contextual semantics. Concretely, structural diversity is quantified using intrinsic graph statistics, yielding a structural feature vector for each graph, while semantic diversity is captured by utilizing a pre-trained language model to encode…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.