Bypassing Skip-Gram Negative Sampling: Dimension Regularization as a More Efficient Alternative for Graph Embeddings

David Liu; Arjun Seshadri; Tina Eliassi-Rad; Johan Ugander

arXiv:2405.00172·cs.LG·June 3, 2025

Bypassing Skip-Gram Negative Sampling: Dimension Regularization as a More Efficient Alternative for Graph Embeddings

David Liu, Arjun Seshadri, Tina Eliassi-Rad, Johan Ugander

PDF

Open Access 3 Reviews

TL;DR

This paper introduces dimension regularization as a scalable and efficient alternative to Skip-Gram Negative Sampling for graph embeddings, maintaining performance while reducing computational resources.

Contribution

It provides a theoretical foundation for dimension regularization as an effective repulsion method and proposes a flexible framework to enhance existing algorithms like LINE and node2vec.

Findings

01

Reduces GPU memory usage by up to 33.3%

02

Speeds up training time by 23.4%

03

Removing repulsion can increase link prediction performance in sparse graphs

Abstract

A wide range of graph embedding objectives decompose into two components: one that enforces similarity, attracting the embeddings of nodes that are perceived as similar, and another that enforces dissimilarity, repelling the embeddings of nodes that are perceived as dissimilar. Without repulsion, the embeddings would collapse into trivial solutions. Skip-Gram Negative Sampling (SGNS) is a popular and efficient repulsion approach that prevents collapse by repelling each node from a sample of dissimilar nodes. In this work, we show that when repulsion is most needed and the embeddings approach collapse, SGNS node-wise repulsion is, in the aggregate, an approximate re-centering of the node embedding dimensions. Such dimension operations are more scalable than node operations and produce a simpler geometric interpretation of the repulsion. Our theoretical result establishes dimension…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 3

Strengths

- Proposes a novel framework aimed at reducing running time and GPU memory requirements. - Supports the proposed framework with theoretical analysis.

Weaknesses

- Given the current dominance of Graph Neural Networks (GNNs) in the graph representation learning domain, the practical relevance of the proposed framework may be limited. - Application of the framework to Node2Vec and LINE results in significant performance drops (notably for Node2Vec), raising concerns about its practical effectiveness. - The theoretical analysis provided lacks detail, making some claims difficult to follow.

Reviewer 02Rating 5Confidence 4

Strengths

+ The proposed method's formalization seems nice, especially in terms of how it performs in the presence of dimensional collapse. + The paper's writing is generally good, and the author's methods are clear.

Weaknesses

- There are some questions about novelty - in that the proposed regularization perhaps "already exists" in the SSL community and replacing SGNS is a seemingly obvious application. However I'm not aware of work that actually does this (... but have not extensively looked for it). - The core argument of the paper is that the proposed method is more efficient than SGNS and therefore "more scalable". While the efficiency of the method is definitely better, its not obvious that it's an "online" l

Reviewer 03Rating 5Confidence 3

Strengths

1. The paper has a pretty solid mathematical interpretation (section 2). The proof flow is very good and quite makes sense. 2. The paper is pretty well-written and easy to understand.

Weaknesses

1. Limited practical impact. The proposed augmentation shows consistent performance degradation in node2vec embeddings by approximately 5% (Table 2). The authors also fail to present potential use cases where their modifications would be beneficial for graph embedding algorithms nowadays. 2. Dataset selection is inadequate: while the key point of proposed modifications lies in potentials in scalability, the dataset selection is Limited to small and medium-scale datasets and misses evaluation on

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Face and Expression Recognition · Domain Adaptation and Few-Shot Learning

MethodsLarge-scale Information Network Embedding · node2vec