Learning Graph Quantized Tokenizers
Limei Wang, Kaveh Hassani, Si Zhang, Dongqi Fu, Baichuan Yuan, Weilin, Cong, Zhigang Hua, Hao Wu, Ning Yao, Bo Long

TL;DR
This paper introduces GQT, a novel graph tokenizer leveraging multi-task self-supervised learning and hierarchical quantization, significantly improving graph representation and achieving state-of-the-art results across diverse benchmarks.
Contribution
GQT is the first graph tokenizer that decouples from Transformer training, uses RVQ for hierarchical tokens, and enhances performance and efficiency in graph learning tasks.
Findings
GQT achieves state-of-the-art results on 20 out of 22 benchmarks.
Hierarchical quantization reduces memory and improves generalization.
Decoupling tokenizer training from Transformer training enhances robustness.
Abstract
Transformers serve as the backbone architectures of Foundational Models, where domain-specific tokenizers allow them to adapt to various domains. Graph Transformers (GTs) have recently emerged as leading models in geometric deep learning, outperforming Graph Neural Networks (GNNs) in various graph learning tasks. However, the development of tokenizers for graphs has lagged behind other modalities. To address this, we introduce GQT (\textbf{G}raph \textbf{Q}uantized \textbf{T}okenizer), which decouples tokenizer training from Transformer training by leveraging multi-task graph self-supervised learning, yielding robust and generalizable graph tokens. Furthermore, the GQT utilizes Residual Vector Quantization (RVQ) to learn hierarchical discrete tokens, resulting in significantly reduced memory requirements and improved generalization capabilities. By combining the GQT with token…
Peer Reviews
Decision·ICLR 2025 Poster
1. paper is well written and easy to follow 2. experiment demonstrate strong results compared with baseline methods. 3. motivation and methodology is clear
NO
- The design of tokenizers in graph Transformers is really an important research topic. The paper well describes the existing works of graph Transformers and tokenizers in the related works. - The paper is easy to follow. - The experiments of the paper demonstrate that the proposed method is effective with multiple experiments including homophilious, heterophilious and large-scale datasets.
- I think that the novelty of the paper is limited. - One of the main contributions highlighted in this paper is quantized tokenization for graph Transformers. However, the paper simply combines graph neural networks for dealing with graph-structured data and existing residual vector quantization for the quantization. - Another contribution is multi-task self-supervised learning. The paper uses two self-supervised learning objectives such as DGI and GMAE2, which are not proposed self-supe
- This paper decouples the graph tokenizer from transformer training, which poses an interesting thread for developing graph transformers. - The proposed method achieves competitive performance on node classification, including heterophilic and homophilic datasets. - GQT scales to benchmarks with large graphs.
- As the core module in GQT, the authors did not provide a sufficient description of the residual vector quantization (nor in the appendix). How does GQT learn the mapping from graph nodes to the codebook tokens? The mapping relation $X_Q$ is introduced in L194/199 and never used again in the following context. - The writing of section 5.2 is rather arbitrary. I cannot fully comprehend the embedding matrix $X_T$, which is 'trained end-to-end with the Transformer'. How does $X_T$ be learned in GQ
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Topic Modeling
MethodsDropout · Layer Normalization · Adam · Attention Is All You Need · Dense Connections · Residual Connection · Position-Wise Feed-Forward Layer · Linear Layer · Byte Pair Encoding · Absolute Position Encodings
