Transformers Generalize DeepSets and Can be Extended to Graphs and   Hypergraphs

Jinwoo Kim; Saeyoon Oh; Seunghoon Hong

arXiv:2110.14416·cs.LG·January 25, 2022

Transformers Generalize DeepSets and Can be Extended to Graphs and Hypergraphs

Jinwoo Kim, Saeyoon Oh, Seunghoon Hong

PDF

Open Access 2 Repos

TL;DR

This paper extends Transformers to higher-order permutation-invariant data like graphs and hypergraphs, introduces sparse and kernel attention methods for scalability, and demonstrates superior performance over existing models.

Contribution

We propose higher-order Transformers for complex data structures, reducing computational complexity and enhancing expressiveness compared to prior invariant models.

Findings

01

Sparse higher-order Transformers are more expressive than message passing GNNs.

02

Kernel attention reduces complexity to linear in input size.

03

Models outperform invariant MLPs and GNNs in large-scale graph tasks.

Abstract

We present a generalization of Transformers to any-order permutation invariant data (sets, graphs, and hypergraphs). We begin by observing that Transformers generalize DeepSets, or first-order (set-input) permutation invariant MLPs. Then, based on recently characterized higher-order invariant MLPs, we extend the concept of self-attention to higher orders and propose higher-order Transformers for order- $k$ data ( $k = 2$ for graphs and $k > 2$ for hypergraphs). Unfortunately, higher-order Transformers turn out to have prohibitive complexity $O (n^{2 k})$ to the number of input nodes $n$ . To address this problem, we present sparse higher-order Transformers that have quadratic complexity to the number of input hyperedges, and further adopt the kernel attention approach to reduce the complexity to linear. In particular, we show that the sparse second-order Transformers with kernel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Energy and Environment Impacts

MethodsLinear Layer · Fast Attention Via Positive Orthogonal Random Features · Performer · Dropout · Dense Connections · Softmax · Multi-Head Attention · Layer Normalization · Attention Is All You Need · Transformer