Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications
Han Xie, Da Zheng, Jun Ma, Houyu Zhang, Vassilis N. Ioannidis, Xiang, Song, Qing Ping, Sheng Wang, Carl Yang, Yi Xu, Belinda Zeng, Trishul Chilimbi

TL;DR
This paper introduces a novel pre-training framework combining language models and graph neural networks on large heterogeneous graph corpora, significantly improving performance across various downstream graph applications.
Contribution
It is the first to pre-train text plus graph models on large heterogeneous graphs and fine-tune for diverse downstream tasks, demonstrating broad applicability.
Findings
Pre-training on large graph corpora enhances downstream task performance.
The proposed framework outperforms existing methods on multiple datasets.
Extensive experiments validate the effectiveness of the approach.
Abstract
Model pre-training on large text corpora has been demonstrated effective for various downstream applications in the NLP domain. In the graph mining domain, a similar analogy can be drawn for pre-training graph models on large graphs in the hope of benefiting downstream graph applications, which has also been explored by several recent studies. However, no existing study has ever investigated the pre-training of text plus graph models on large heterogeneous graphs with abundant textual information (a.k.a. large graph corpora) and then fine-tuning the model on different related downstream applications with different graph schemas. To address this problem, we propose a framework of graph-aware language model pre-training (GALM) on a large graph corpus, which incorporates large language models and graph neural networks, and a variety of fine-tuning methods on downstream applications. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Natural Language Processing Techniques
