Rethinking Graph Transformer Architecture Design for Node Classification

Jiajun Zhou; Xuanze Chen; Chenxuan Xie; Yu Shanqing; Qi Xuan; Xiaoniu; Yang

arXiv:2410.11189·cs.LG·October 16, 2024

Rethinking Graph Transformer Architecture Design for Node Classification

Jiajun Zhou, Xuanze Chen, Chenxuan Xie, Yu Shanqing, Qi Xuan, Xiaoniu, Yang

PDF

Open Access

TL;DR

This paper proposes GNNFormer, a new graph transformer architecture that decouples propagation and transformation, improving node classification performance and scalability across diverse graph types.

Contribution

It introduces a novel P/T decoupled architecture, replacing the multi-head self-attention module with a more efficient design for node classification.

Findings

01

Effective on 12 benchmark datasets

02

Resists global noise in large graphs

03

Improves computational efficiency

Abstract

Graph Transformer (GT), as a special type of Graph Neural Networks (GNNs), utilizes multi-head attention to facilitate high-order message passing. However, this also imposes several limitations in node classification applications: 1) nodes are susceptible to global noise; 2) self-attention computation cannot scale well to large graphs. In this work, we conduct extensive observational experiments to explore the adaptability of the GT architecture in node classification tasks and draw several conclusions: the current multi-head self-attention module in GT can be completely replaceable, while the feed-forward neural network module proves to be valuable. Based on this, we decouple the propagation (P) and transformation (T) of GNNs and explore a powerful GT architecture, named GNNFormer, which is based on the P/T combination message passing and adapted for node classification in both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks

MethodsDense Connections · Residual Connection · Dropout · Layer Normalization · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax · Attention Is All You Need · Linear Layer