Aligning Transformers with Weisfeiler-Leman

Luis M\"uller; Christopher Morris

arXiv:2406.03148·cs.LG·June 6, 2024

Aligning Transformers with Weisfeiler-Leman

Luis M\"uller, Christopher Morris

PDF

1 Repo

TL;DR

This paper enhances transformer architectures aligned with the Weisfeiler-Leman hierarchy, improving their expressivity and practicality for graph tasks, and demonstrates competitive performance on large-scale and molecular datasets.

Contribution

It advances the alignment of transformers with the $k$-WL hierarchy, providing stronger theoretical expressivity results and practical feasibility, along with a framework for studying positional encodings.

Findings

01

Stronger expressivity results for transformers aligned with $k$-WL.

02

Competitive performance on PCQM4Mv2 dataset.

03

Effective fine-tuning on small molecular datasets.

Abstract

Graph neural network architectures aligned with the $k$ -dimensional Weisfeiler--Leman ( $k$ -WL) hierarchy offer theoretically well-understood expressive power. However, these architectures often fail to deliver state-of-the-art predictive performance on real-world graphs, limiting their practical utility. While recent works aligning graph transformer architectures with the $k$ -WL hierarchy have shown promising empirical results, employing transformers for higher orders of $k$ remains challenging due to a prohibitive runtime and memory complexity of self-attention as well as impractical architectural assumptions, such as an infeasible number of attention heads. Here, we advance the alignment of transformers with the $k$ -WL hierarchy, showing stronger expressivity results for each $k$ , making them more feasible in practice. In addition, we develop a theoretical framework that allows the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luis-mueller/wl-transformers
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Laplacian EigenMap · Layer Normalization · Laplacian Positional Encodings · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need