TL;DR
This paper presents a unified graph-language model that leverages the scale-free property of real-world graphs, approximates it with KNN graphs, and uses this for improved semi-supervised learning in graph-based tasks.
Contribution
It introduces a novel GLM framework that integrates graph generation and text embedding using the scale-free property as a structural prior, reducing reliance on artificial assumptions and extensive annotations.
Findings
KNN graphs effectively approximate scale-free properties.
The integrated model improves semi-supervised learning performance.
Experimental results validate the structural assumptions and model effectiveness.
Abstract
Graph-language models (GLMs) have demonstrated great potential in graph-based semi-supervised learning. A typical GLM consists of two key stages: graph generation and text embedding, which are usually implemented by inferring a latent graph and finetuning a language model (LM), respectively. However, the former often relies on artificial assumptions about the underlying edge distribution, while the latter requires extensive data annotations. To tackle these challenges, this paper introduces a novel GLM that integrates graph generation and text embedding within a unified framework. Specifically, for graph generation, we leverage an inherent characteristic of real edge distribution--the scale-free property--as a structural prior. We unexpectedly find that this natural property can be effectively approximated by a simple k-nearest neighbor (KNN) graph. For text embedding, we develop a…
Peer Reviews
Decision·ICLR 2025 Poster
1. The author has focused on a topic with wide applications, going beyond GCN for text-attributed graphs. The proposed method can be applied to any document without explicitly extracting links. 2. This paper theoretically proves that k-nn can approximate the degree distribution of a scale-free network. 3. The paper presents an extensive set of experiments with several reference citation networks (Cora, Pubmed, ogbn-arxiv and arxiv23). Comprehensive experiments have been conducted using differen
1. Contribution and Achievements are not clearly introduced: To the best of our knowledge, this work is the first attempt to integrate graph generation and text embedding into a unified GLM framework. 2. Algorithm Design: The introduction of the algorithm design should include necessary details in the main paper, not just in the Appendix. The roles of the GCN, text embedding, and pseudo-labeler should be explained in the main paper. A complexity analysis of each step should be provided shortly
1. The paper proposed a new KNN Graph, which has the property of scale-free graph, can be used as a good prior of scale-free graph such as citation graph. 2. The paper first introduce the application of scale-free graph in text attributed citation network. 3. The proposed approach is carefully verified using empirical and theoretical approach. 4. The proposed method outperformed strong baselines 5. The experiments conducted by the paper is solid and comprehensive 6. Detailed ablation stu
1. The explanation of correlation between theory of scale-free network and KNN graph is a bit insubstantial, the theoretical explanation did not provide sufficient intuition to the approach.
1. The idea is novel. A unique contribution is to use KNNs for Graph generation, which I think will be very inspiring for semi-supervised tasks in homogeneous networks, especially from the comparison with the Real Graph on Table 6, the generated graph is even better than the real graph. 2. The paper is well-written, providing a detailed description of the problem setting and research background. The authors clearly articulate the challenges in existing graph-language models and their motivatio
1. The paper primarily focuses on citation networks, which are inherently homophilic. The applicability of the proposed KNN-based graph generation method to other types of textual networks, such as academic or social networks, is not explored. 2. The core contribution of the paper lies in using KNN to approximate graphs, but this approach seems to be more suited for homophilic networks. In more diverse scenarios, such as heterogeneous academic networks with different types of nodes, the use
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGLM
