VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections

Dongqi Fu; Zhigang Hua; Yan Xie; Jin Fang; Si Zhang; Kaan Sancak; Hao; Wu; Andrey Malevich; Jingrui He; Bo Long

arXiv:2403.16030·cs.LG·March 26, 2024·3 cites

VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections

Dongqi Fu, Zhigang Hua, Yan Xie, Jin Fang, Si Zhang, Kaan Sancak, Hao, Wu, Andrey Malevich, Jingrui He, Bo Long

PDF

Open Access 1 Repo 3 Reviews

TL;DR

VCR-Graphormer introduces a mini-batch graph transformer that uses personalized PageRank tokenization and virtual connections to efficiently encode complex graph information, enabling scalable and expressive graph learning.

Contribution

The paper proposes a novel mini-batch training method for graph transformers using PPR tokenization and virtual connections, improving scalability and representation capacity.

Findings

01

Enables effective mini-batch training of graph transformers.

02

Decouples feature engineering from model training.

03

Supports local, global, and heterophilous information encoding.

Abstract

Graph transformer has been proven as an effective graph learning method for its adoption of attention mechanism that is capable of capturing expressive representations from complex topological and feature information of graphs. Graph transformer conventionally performs dense attention (or global attention) for every pair of nodes to learn node representation vectors, resulting in quadratic computational costs that are unaffordable for large-scale graph data. Therefore, mini-batch training for graph transformers is a promising direction, but limited samples in each mini-batch can not support effective dense attention to encode informative representations. Facing this bottleneck, (1) we start by assigning each node a token list that is sampled by personalized PageRank (PPR) and then apply standard multi-head self-attention only on this list to compute its node representations. This PPR…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

The scalable graph transformer is a hot, interesting, and important field in our research community. Training with mini-batches is memory-efficient by nature. The experiments are extensive (but focus on homophilous datasets). The proposed method outperforms baselines by a large margin, especially for heterophilous graphs.

Weaknesses

- The paper is hard to follow. Abuse of notations in Eq 3.1, 3.2, and 3.3 might be confusing for readers. Please formally define the operations or use well-known operations. Is {.} a set or a list? Please proper set-builder notations. Please put l (l-th step) to the name of the variable, r_u? Is the concatenation operator applied to both scalars and vectors? The cardinality of T_u in Eq 3.3 is 4 when you note it like this. Plus, it would be nice if the authors explained what insights we can see

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

- The presentation is easy to follow and the intuition of the approach well supported. - Types, number of datasets and baseline models are adequate for demonstrating the efficacy of the approach for the node classification task in particular. Ablation studies are very informative (Figures 3 and 4).

Weaknesses

- The inclusion of additional graph learning tasks (edge/graph classification) would further establish the validity/generality of the token list preparation approach proposed.

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 5

Strengths

1. This paper summarizes four metrics for graph tokenization methods. 2. This paper leverages existing techniques, like PPR and the graph partition method, to generate the token list for each target node. 3. This paper provides several theoretical analyses for the proposed method. 4. Empirical results on different scale datasets seem to indicate the promising performance of the proposed method.

Weaknesses

1. Several recent studies on designing graph Transformers with node sampling or node clustering are ignored. 2. Experimental results are inefficient in demonstrating the merits of the proposed method.

Code & Models

Repositories

dongqifu/vcr-graphormer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInterconnection Networks and Systems · Graph Theory and Algorithms

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Absolute Position Encodings · Laplacian EigenMap · Position-Wise Feed-Forward Layer · Residual Connection · Multi-Head Attention · Softmax