Multi-scale Graph Autoregressive Modeling: Molecular Property Prediction via Next Token Prediction
Zhuoyang Jiang, Yaosen Min, Peiran Jin, Lei Chen

TL;DR
This paper introduces CamS, a novel graph-to-sequence representation for molecular graphs that improves property prediction by combining hierarchical motif serialization with transformer models, achieving state-of-the-art results.
Contribution
CamS bridges the gap between SMILES and graph-native modeling by serializing molecular graphs into hierarchical, structure-rich sequences for transformer-based learning.
Findings
Achieves state-of-the-art performance on MoleculeNet and MoleculeACE benchmarks.
Effectively captures chemical activity cliffs through hierarchical motif serialization.
Enables hierarchical modeling from local to global molecular structures.
Abstract
We present Connection-Aware Motif Sequencing (CamS), a graph-to-sequence representation that enables decoder-only Transformers to learn molecular graphs via standard next-token prediction (NTP). For molecular property prediction, SMILES-based NTP scales well but lacks explicit topology, whereas graph-native masked modeling captures connectivity but risks disrupting the pivotal chemical details (e.g., activity cliffs). CamS bridges this gap by serializing molecular graphs into structure-rich causal sequences. CamS first mines data-driven connection-aware motifs. It then serializes motifs via scaffold-rooted breadth-first search (BFS) to establish a stable core-to-periphery order. Crucially, CamS enables hierarchical modeling by concatenating sequences from fine to coarse motif scales, allowing the model to condition global scaffolds on dense, uncorrupted local structural evidence. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Computational Drug Discovery Methods · Bioinformatics and Genomic Networks
