ParaFormer: A Generalized PageRank Graph Transformer for Graph Representation Learning
Chaohao Yuan, Zhenjie Song, Ercan Engin Kuruoglu, Kangfei Zhao, Yang Liu, Deli Zhao, Hong Cheng, Yu Rong

TL;DR
ParaFormer introduces a PageRank-enhanced attention mechanism in graph transformers to mitigate over-smoothing, leading to improved performance in node and graph classification tasks across diverse datasets.
Contribution
It proposes a novel PageRank-based attention module that acts as an adaptive-pass filter, effectively reducing over-smoothing in graph transformers.
Findings
ParaFormer outperforms existing models on 11 datasets.
Theoretical analysis confirms adaptive-pass filtering reduces over-smoothing.
Empirical results show consistent improvements in classification accuracy.
Abstract
Graph Transformers (GTs) have emerged as a promising graph learning tool, leveraging their all-pair connected property to effectively capture global information. To address the over-smoothing problem in deep GNNs, global attention was initially introduced, eliminating the necessity for using deep GNNs. However, through empirical and theoretical analysis, we verify that the introduced global attention exhibits severe over-smoothing, causing node representations to become indistinguishable due to its inherent low-pass filtering. This effect is even stronger than that observed in GNNs. To mitigate this, we propose PageRank Transformer (ParaFormer), which features a PageRank-enhanced attention module designed to mimic the behavior of deep Transformers. We theoretically and empirically demonstrate that ParaFormer mitigates over-smoothing by functioning as an adaptive-pass filter. Experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Domain Adaptation and Few-Shot Learning
