Todyformer: Towards Holistic Dynamic Graph Transformers with Structure-Aware Tokenization
Mahdi Biparva, Raika Karimi, Faezeh Faez, Yingxue Zhang

TL;DR
Todyformer is a novel Transformer-based model for dynamic graphs that combines local and global encoding strategies to address over-squashing and over-smoothing, achieving superior performance on benchmark datasets.
Contribution
It introduces a structure-aware tokenization and an alternating encoding architecture to enhance dynamic graph modeling with Transformers.
Findings
Outperforms state-of-the-art methods on benchmark datasets
Effectively captures extensive temporal dependencies
Mitigates over-squashing and over-smoothing issues
Abstract
Temporal Graph Neural Networks have garnered substantial attention for their capacity to model evolving structural and temporal patterns while exhibiting impressive performance. However, it is known that these architectures are encumbered by issues that constrain their performance, such as over-squashing and over-smoothing. Meanwhile, Transformers have demonstrated exceptional computational capacity to effectively address challenges related to long-range dependencies. Consequently, we introduce Todyformer-a novel Transformer-based neural network tailored for dynamic graphs. It unifies the local encoding capacity of Message-Passing Neural Networks (MPNNs) with the global encoding of Transformers through i) a novel patchifying paradigm for dynamic graphs to improve over-squashing, ii) a structure-aware parametric tokenization strategy leveraging MPNNs, iii) a Transformer with temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel-Driven Software Engineering Techniques · Graph Theory and Algorithms · Scientific Computing and Data Management
MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Softmax · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Residual Connection
