Relational Graph Transformer

Vijay Prakash Dwivedi; Sri Jaladi; Yangyi Shen; Federico L\'opez; Charilaos I. Kanatsoulis; Rishi Puri; Matthias Fey; Jure Leskovec

arXiv:2505.10960·cs.LG·February 6, 2026

Relational Graph Transformer

Vijay Prakash Dwivedi, Sri Jaladi, Yangyi Shen, Federico L\'opez, Charilaos I. Kanatsoulis, Rishi Puri, Matthias Fey, Jure Leskovec

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces RelGT, a novel relational graph transformer architecture that effectively models complex, heterogeneous, and temporal relational data, outperforming traditional GNNs on multiple benchmark tasks.

Contribution

RelGT is the first graph transformer tailored for relational tables, using a multi-element tokenization strategy to encode heterogeneity, temporality, and topology efficiently.

Findings

01

RelGT matches or exceeds GNN performance on 21 benchmark tasks.

02

RelGT outperforms GNN baselines by up to 18%.

03

The architecture effectively captures complex relational structures.

Abstract

Relational Deep Learning (RDL) is a promising approach for building state-of-the-art predictive models on multi-table relational data by representing it as a heterogeneous temporal graph. However, commonly used Graph Neural Network models suffer from fundamental limitations in capturing complex structural patterns and long-range dependencies that are inherent in relational data. While Graph Transformers have emerged as powerful alternatives to GNNs on general graphs, applying them to relational entity graphs presents unique challenges: (i) Traditional positional encodings fail to generalize to massive, heterogeneous graphs; (ii) existing architectures cannot model the temporal dynamics and schema constraints of relational data; (iii) existing tokenization schemes lose critical structural information. Here we introduce the Relational Graph Transformer (RelGT), the first graph transformer…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. Detailed ablation study verify the effectiveness of model. Tables provides ablation results of all model components on all datasets. These results show consistent performance gain of designs in this work. 2. Careful architecture design. Table 3 provides architecture illustration in detail.

Weaknesses

1. Recent strong RDB baseline[1] is missing. Including it may make its contribution more clear. 2. RelGT is trained on each dataset separately, with no pretrain and transfer learning experiments, making the architectural design contribution less significant. [1] Yanbo Wang, et al. Griffin:Towards a graph-centric relational database foundation model. ICML 2025.

Reviewer 02Rating 6Confidence 2

Strengths

1. The multi-element tokenization is a clean, compositional alternative to “one-shot” global PEs, explicitly encoding heterogeneity, relative structure, and time, plus a lightweight subgraph PE. This is well-motivated for relational graphs and distinct from standard GTs or HGT variants. 2. Establishing a transformer baseline that consistently competes with or surpasses RDL’s hetero-GNN baseline on RelBench is meaningful; the design choices are broadly applicable in enterprise REG settings and co

Weaknesses

See Questions.

Reviewer 03Rating 6Confidence 3

Strengths

1. Clear motivation, good explanation of problem 2. Architecture is well-explained, the different modules are well motivated and justified, and an ablation study analyses their benefits. Each component is linked to a specific challenge in relational deep learning, which is very good. 3. Extensive experiments: The evaluation is broad and the results are convincing. That said, as I am not an expert in all recent baselines for this specific area, I cannot definitively assess the completeness of t

Weaknesses

### 1. Relation to Temporal Knowledge Graphs (TKG) * There is no proper distinction between relational entity graphs and TKG (see e.g. TGB 2.0), especially in section 2.1 challenges: while I agree that relational entity graphs are difficult and challenging, and different to conventional graph data, in this section, the difference to temporal knowledge graphs is not clarified in my opinion. in my opinion, the difference are mostly in the entity-specific attributes - do you agree? if yes, it wou

Code & Models

Repositories

snap-stanford/relgt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Data Quality and Management · Machine Learning in Healthcare

MethodsAttention Is All You Need · Laplacian EigenMap · Laplacian Positional Encodings · Linear Layer · Multi-Head Attention · Dense Connections · Graph Transformer · Layer Normalization · Graph Neural Network · Byte Pair Encoding