Attribute-Enhanced Similarity Ranking for Sparse Link Prediction

Jo\~ao Mattos; Zexi Huang; Mert Kosan; Ambuj Singh; Arlei Silva

arXiv:2412.00261·cs.LG·December 3, 2024

Attribute-Enhanced Similarity Ranking for Sparse Link Prediction

Jo\~ao Mattos, Zexi Huang, Mert Kosan, Ambuj Singh, Arlei Silva

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Gelato, a similarity-based link prediction method that leverages node attributes and advanced sampling techniques to outperform GNNs in sparse, real-world graph scenarios.

Contribution

The paper proposes Gelato, a novel link prediction approach combining attribute-enhanced similarity, ranking loss, and efficient negative sampling, addressing GNN limitations in sparse graphs.

Findings

01

Gelato outperforms GNN-based methods in sparse link prediction tasks.

02

Attribute integration improves prediction accuracy in sparse graphs.

03

Efficient negative sampling enhances training effectiveness.

Abstract

Link prediction is a fundamental problem in graph data. In its most realistic setting, the problem consists of predicting missing or future links between random pairs of nodes from the set of disconnected pairs. Graph Neural Networks (GNNs) have become the predominant framework for link prediction. GNN-based methods treat link prediction as a binary classification problem and handle the extreme class imbalance -- real graphs are very sparse -- by sampling (uniformly at random) a balanced number of disconnected pairs not only for training but also for evaluation. However, we show that the reported performance of GNNs for link prediction in the balanced setting does not translate to the more realistic imbalanced setting and that simpler topology-based approaches are often better at handling sparsity. These findings motivate Gelato, a similarity-based link-prediction method that applies…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 3· reject, not good enoughConfidence 4

Strengths

+ Considering that various negative pair samples are actually far from the real links, the motivation that targets unbiased testing is reasonable and the idea makes sense. + The writing is clear to show all the details of the proposed Gelato. + For the metric hit@1000, Gelato outperforms other baselines on most benchmarks, especially the sparse graphs, which empirically proves the validity of Gelato on sparse graphs.

Weaknesses

- The design that includes attribute similarity, topological weights, the untrained weights, and the trained weights incorporates three hyper-parameters $\epsilon_{\eta}$, $\alpha$, and $\beta$ to control the weights, which makes it more difficult to tune an optimal model. - Though different parts are delicately devised to improve the performance of Gelato, the ablation study is missing to show the actual effect. For example, does N-pair loss really work, and is better than CE? When ignoring the

Reviewer 02Rating 3· reject, not good enoughConfidence 4

Strengths

N/A

Weaknesses

The chief problem the paper raised was the small portion of negative pair used in testing. The arguments do not back up the claim. - These balanced sampling methods overestimate the ratio of positive pairs. There is no evidence that any of these methods estimating ratios of positive pairs. - "AUC is not an effective evaluation metric for link prediction as it is biased towards the majority class". This needs an evidence! - The example show that negative pairs have <2% of intra-block pairs. What

Reviewer 03Rating 3· reject, not good enoughConfidence 4

Strengths

1. The paper is easy to follow with reasonable clarity to understand the techniques. 2. Experiments are conducted on several benchmark datasets.

Weaknesses

1. The techniques in the proposed method in Section 3.1 to 3.4 are mostly existing techniques simply adopted into the paper. The novelty of the proposed method is quite unclear. Many attributed graph representation learning methods exist. 2. The unbiased setting is not well-motivated. Why consider all random node pairs in a graph? For disconnected pairs, given a node, you can just sample the node pairs near the node via graph topology, e.g., within 2-hop, 3-hop. I do not agree that random sampl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Network Analysis Techniques · Advanced Graph Neural Networks · Advanced Clustering Algorithms Research

MethodsSparse Evolutionary Training