Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules
Zhiyuan Liu, Yaorui Shi, An Zhang, Enzhi Zhang, Kenji Kawaguchi, Xiang, Wang, Tat-Seng Chua

TL;DR
This paper critically examines and enhances the components of masked graph modeling for molecules, introducing a novel method that improves self-supervised molecular graph representations by focusing on tokenizer and decoder design.
Contribution
The study provides a comprehensive analysis of tokenizer and decoder roles in MGM, proposing a new method with a simple GNN tokenizer and effective decoding that outperforms existing approaches.
Findings
Subgraph-level tokenizer significantly improves encoder representations.
Expressive decoders with remask decoding enhance learning.
SimSGT outperforms previous self-supervised molecular learning methods.
Abstract
Masked graph modeling excels in the self-supervised representation learning of molecular graphs. Scrutinizing previous studies, we can reveal a common scheme consisting of three key components: (1) graph tokenizer, which breaks a molecular graph into smaller fragments (i.e., subgraphs) and converts them into tokens; (2) graph masking, which corrupts the graph with masks; (3) graph autoencoder, which first applies an encoder on the masked graph to generate the representations, and then employs a decoder on the representations to recover the tokens of the original graph. However, the previous MGM studies focus extensively on graph masking and encoder, while there is limited understanding of tokenizer and decoder. To bridge the gap, we first summarize popular molecule tokenizers at the granularity of node, edge, motif, and Graph Neural Networks (GNNs), and then examine their roles as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Graph Neural Networks · Machine Learning in Materials Science · Computational Drug Discovery Methods
MethodsFocus
