Relational Graph Transformer
Vijay Prakash Dwivedi, Sri Jaladi, Yangyi Shen, Federico L\'opez, Charilaos I. Kanatsoulis, Rishi Puri, Matthias Fey, Jure Leskovec

TL;DR
This paper introduces RelGT, a novel relational graph transformer architecture that effectively models complex, heterogeneous, and temporal relational data, outperforming traditional GNNs on multiple benchmark tasks.
Contribution
RelGT is the first graph transformer tailored for relational tables, using a multi-element tokenization strategy to encode heterogeneity, temporality, and topology efficiently.
Findings
RelGT matches or exceeds GNN performance on 21 benchmark tasks.
RelGT outperforms GNN baselines by up to 18%.
The architecture effectively captures complex relational structures.
Abstract
Relational Deep Learning (RDL) is a promising approach for building state-of-the-art predictive models on multi-table relational data by representing it as a heterogeneous temporal graph. However, commonly used Graph Neural Network models suffer from fundamental limitations in capturing complex structural patterns and long-range dependencies that are inherent in relational data. While Graph Transformers have emerged as powerful alternatives to GNNs on general graphs, applying them to relational entity graphs presents unique challenges: (i) Traditional positional encodings fail to generalize to massive, heterogeneous graphs; (ii) existing architectures cannot model the temporal dynamics and schema constraints of relational data; (iii) existing tokenization schemes lose critical structural information. Here we introduce the Relational Graph Transformer (RelGT), the first graph transformer…
Peer Reviews
Decision·ICLR 2026 Poster
1. Detailed ablation study verify the effectiveness of model. Tables provides ablation results of all model components on all datasets. These results show consistent performance gain of designs in this work. 2. Careful architecture design. Table 3 provides architecture illustration in detail.
1. Recent strong RDB baseline[1] is missing. Including it may make its contribution more clear. 2. RelGT is trained on each dataset separately, with no pretrain and transfer learning experiments, making the architectural design contribution less significant. [1] Yanbo Wang, et al. Griffin:Towards a graph-centric relational database foundation model. ICML 2025.
1. The multi-element tokenization is a clean, compositional alternative to “one-shot” global PEs, explicitly encoding heterogeneity, relative structure, and time, plus a lightweight subgraph PE. This is well-motivated for relational graphs and distinct from standard GTs or HGT variants. 2. Establishing a transformer baseline that consistently competes with or surpasses RDL’s hetero-GNN baseline on RelBench is meaningful; the design choices are broadly applicable in enterprise REG settings and co
See Questions.
1. Clear motivation, good explanation of problem 2. Architecture is well-explained, the different modules are well motivated and justified, and an ablation study analyses their benefits. Each component is linked to a specific challenge in relational deep learning, which is very good. 3. Extensive experiments: The evaluation is broad and the results are convincing. That said, as I am not an expert in all recent baselines for this specific area, I cannot definitively assess the completeness of t
### 1. Relation to Temporal Knowledge Graphs (TKG) * There is no proper distinction between relational entity graphs and TKG (see e.g. TGB 2.0), especially in section 2.1 challenges: while I agree that relational entity graphs are difficult and challenging, and different to conventional graph data, in this section, the difference to temporal knowledge graphs is not clarified in my opinion. in my opinion, the difference are mostly in the entity-specific attributes - do you agree? if yes, it wou
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Data Quality and Management · Machine Learning in Healthcare
MethodsAttention Is All You Need · Laplacian EigenMap · Laplacian Positional Encodings · Linear Layer · Multi-Head Attention · Dense Connections · Graph Transformer · Layer Normalization · Graph Neural Network · Byte Pair Encoding
