Attribute-Enhanced Similarity Ranking for Sparse Link Prediction
Jo\~ao Mattos, Zexi Huang, Mert Kosan, Ambuj Singh, Arlei Silva

TL;DR
This paper introduces Gelato, a similarity-based link prediction method that leverages node attributes and advanced sampling techniques to outperform GNNs in sparse, real-world graph scenarios.
Contribution
The paper proposes Gelato, a novel link prediction approach combining attribute-enhanced similarity, ranking loss, and efficient negative sampling, addressing GNN limitations in sparse graphs.
Findings
Gelato outperforms GNN-based methods in sparse link prediction tasks.
Attribute integration improves prediction accuracy in sparse graphs.
Efficient negative sampling enhances training effectiveness.
Abstract
Link prediction is a fundamental problem in graph data. In its most realistic setting, the problem consists of predicting missing or future links between random pairs of nodes from the set of disconnected pairs. Graph Neural Networks (GNNs) have become the predominant framework for link prediction. GNN-based methods treat link prediction as a binary classification problem and handle the extreme class imbalance -- real graphs are very sparse -- by sampling (uniformly at random) a balanced number of disconnected pairs not only for training but also for evaluation. However, we show that the reported performance of GNNs for link prediction in the balanced setting does not translate to the more realistic imbalanced setting and that simpler topology-based approaches are often better at handling sparsity. These findings motivate Gelato, a similarity-based link-prediction method that applies…
Peer Reviews
Decision·Submitted to ICLR 2024
+ Considering that various negative pair samples are actually far from the real links, the motivation that targets unbiased testing is reasonable and the idea makes sense. + The writing is clear to show all the details of the proposed Gelato. + For the metric hit@1000, Gelato outperforms other baselines on most benchmarks, especially the sparse graphs, which empirically proves the validity of Gelato on sparse graphs.
- The design that includes attribute similarity, topological weights, the untrained weights, and the trained weights incorporates three hyper-parameters $\epsilon_{\eta}$, $\alpha$, and $\beta$ to control the weights, which makes it more difficult to tune an optimal model. - Though different parts are delicately devised to improve the performance of Gelato, the ablation study is missing to show the actual effect. For example, does N-pair loss really work, and is better than CE? When ignoring the
N/A
The chief problem the paper raised was the small portion of negative pair used in testing. The arguments do not back up the claim. - These balanced sampling methods overestimate the ratio of positive pairs. There is no evidence that any of these methods estimating ratios of positive pairs. - "AUC is not an effective evaluation metric for link prediction as it is biased towards the majority class". This needs an evidence! - The example show that negative pairs have <2% of intra-block pairs. What
1. The paper is easy to follow with reasonable clarity to understand the techniques. 2. Experiments are conducted on several benchmark datasets.
1. The techniques in the proposed method in Section 3.1 to 3.4 are mostly existing techniques simply adopted into the paper. The novelty of the proposed method is quite unclear. Many attributed graph representation learning methods exist. 2. The unbiased setting is not well-motivated. Why consider all random node pairs in a graph? For disconnected pairs, given a node, you can just sample the node pairs near the node via graph topology, e.g., within 2-hop, 3-hop. I do not agree that random sampl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Graph Neural Networks · Advanced Clustering Algorithms Research
MethodsSparse Evolutionary Training
