Tokenizing 3D Molecule Structure with Quantized Spherical Coordinates
Kaiyuan Gao, Yusong Wang, Haoxiang Guan, Zun Wang, Qizhi Pei, John E., Hopcroft, Kun He, Lijun Wu

TL;DR
This paper introduces Mol-StrucTok, a novel method for tokenizing 3D molecular structures using spherical coordinates and vector quantization, enabling efficient and stable 3D molecule generation and property prediction.
Contribution
It presents a new 3D molecule tokenization scheme combining spherical coordinate notation with VQ-VAE, facilitating improved 3D molecular generation and property prediction.
Findings
Faster molecule generation with competitive stability.
Enhanced property prediction accuracy on QM9 dataset.
Versatile tokenization compatible with various molecular representations.
Abstract
The application of language models (LMs) to molecular structure generation using line notations such as SMILES and SELFIES has been well-established in the field of cheminformatics. However, extending these models to generate 3D molecular structures presents significant challenges. Two primary obstacles emerge: (1) the difficulty in designing a 3D line notation that ensures SE(3)-invariant atomic coordinates, and (2) the non-trivial task of tokenizing continuous coordinates for use in LMs, which inherently require discrete inputs. To address these challenges, we propose Mol-StrucTok, a novel method for tokenizing 3D molecular structures. Our approach comprises two key innovations: (1) We design a line notation for 3D molecules by extracting local atomic coordinates in a spherical coordinate system. This notation builds upon existing 2D line notations and remains agnostic to their…
Peer Reviews
Decision·Submitted to ICLR 2025
1. This paper is well-written and easy to follow, with clear and informative tables and figures. 2. The proposed method performs well, especially on the conditional generation task. 3. The ablation study is thorough and provides useful insights.
1. The proposed method is quite similar to existing methods, such as FoldSeek and FoldToken. Specifically, similar to the SE(3)-invariant spherical coordinates here, FoldSeek also uses distances and angles computed based on reference nodes as SE(3)-invariant representations. In addition, Furthermore, both methods employ VQ-VAE to learn discrete tokens. These overlapping components limit the novelty of this work. 2. About the datasets: the proposed method is only evaluated on QM9 dataset, which i
The combination of spherical line notation with vector quantization enables language models to process complex 3D data, which is challenging to discretize. This approach stands out from traditional graph-based or continuous-coordinate models by providing a discrete representation for language models without losing SE(3)-invariant information. Particularly, the augmented tokens incorporate both generation and understanding descriptors, including local spherical coordinates, bond lengths, and angl
### Major The authors should clarify the rationale behind selecting exactly four neighbors for the atomic descriptor and explicitly address how the descriptor $\mathbf{z}_i$ is defined for atoms with fewer than four neighbors. This is essential, as molecules with varying coordination environments will likely have different numbers of neighbors, impacting the generality of the descriptor across datasets. ### Minor 1. The paper’s notations are somewhat inconsistent and could benefit from simplifi
1. The authors conduct an extensive set of experiments. They measure validity+uniqueness of generated molecules with different bond assignment methods, perform PoseBusters tests, evaluate quantum mechanical properties, and measure MAE for QM9 property prediction. They achieve state-of-the-art results in most experiments. 2. They also perform additional analysis regarding the inference speed of their method and the effect of the generation temperature on balancing quality and diversity.
1. This is a hand-crafted tokenization scheme and should be compared to other tokenizers (e.g. BPE-based tokenizers), not just diffusion models and MPNN-based methods. 2. It may also be helpful to compare with structures expressed in other coordinate systems. I'd imagine that without SE(3) invariance there would be a wider range of possible tokenized sequences, making it harder for the GPT-2 model to learn.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNanofabrication and Lithography Techniques · Diatoms and Algae Research
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Layer Normalization · Linear Layer · Discriminative Fine-Tuning · Weight Decay · Attention Dropout · Residual Connection · Adam · Attention Is All You Need
