Polynormer: Polynomial-Expressive Graph Transformer in Linear Time
Chenhui Deng, Zichao Yue, Zhiru Zhang

TL;DR
Polynormer is a novel polynomial-expressive graph transformer that achieves linear complexity, balancing expressivity and scalability, and outperforms existing models on large-scale graph datasets.
Contribution
We introduce Polynormer, a linear-time graph transformer that learns high-degree polynomial functions, enhancing expressivity while maintaining scalability.
Findings
Outperforms state-of-the-art GNN and GT models on most datasets.
Effective on large graphs with millions of nodes.
Operates without nonlinear activation functions.
Abstract
Graph transformers (GTs) have emerged as a promising architecture that is theoretically more expressive than message-passing graph neural networks (GNNs). However, typical GT models have at least quadratic complexity and thus cannot scale to large graphs. While there are several linear GTs recently proposed, they still lag behind GNN counterparts on several popular graph datasets, which poses a critical concern on their practical expressivity. To balance the trade-off between expressivity and scalability of GTs, we propose Polynormer, a polynomial-expressive GT model with linear complexity. Polynormer is built upon a novel base model that learns a high-degree polynomial on input features. To enable the base model permutation equivariant, we integrate it with graph topology and node features separately, resulting in local and global equivariant attention models. Consequently, Polynormer…
Peer Reviews
Decision·ICLR 2024 poster
Strengths: - Provides theoretical analysis of polynomial expressivity, though restricted to scalar features. It goes beyond the WL expressivity as common in graph learning literature. - Demonstrates the performance of the architecture on 13 datasets where comparisons with baselines make the proposed model better on 11 datasets. - Ablation study on a smaller dataset group shows benefits of global attention and local-to-global scheme.
Weaknesses and Questions: -The theoretical expressivity claims in Section 3.1 may be overclaiming capabilities, as proofs make simplifying assumptions about scalar features that differ from real graph data (Section 4). Can this be justified further? -While complexity is analyzed, runtime and memory usage are not empirically compared to baselines in Section 4.2 to demonstrate scalability.
- The idea to adopt attention model in the polynomial feature mapping is novel and interesting. - Experiments are sufficient. Many important baselines and datasets of various sizes are covered.
- I find that the proposed approach (global) may also work in transformers in other fields, e.g. NLP. Could you provide such experiments to show its capacity in dealing with different types of data? - Why and how could polynomial expressivity improve model performance? The point was not clear.
- The idea of designing a polynomial-expressive graph Transformer model is novel and interesting. - The resulting Polynormer model is powerful, efficient, and theoretically expressive. - The experiments are convincing, showing that Polynormer can outperform sota GNNs and GTs on a wide range of datasets.
- It is inappropriate to claim that GTs and GNNs has limited polynomial expressivity (in section 3.1 and appendix C), since the non-linearity layers are not negligible. In [1] it is shown that without softmax GTs cannot represent GNNs. And in [2] Transformers are proved to be universal approximators on sequences with the softmax layer as key component. Can you discuss the polynomial expressivity of GTs and GNNs with non-linearity layers? And since [2] proves that Transformers are universal appro
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications · Semiconductor Lasers and Optical Devices · Photonic and Optical Devices
MethodsGraph Transformer
