FragmentNet: Adaptive Graph Fragmentation for Graph-to-Sequence Molecular Representation Learning
Ankur Samanta, Rohan Gupta, Aditi Misra, Christian McIntosh Clarke,, Jayakumar Rajadas

TL;DR
FragmentNet introduces an adaptive, learned graph tokenizer for molecular graphs that improves property prediction by capturing meaningful fragments and structural information, outperforming existing methods with better scalability and interpretability.
Contribution
The paper presents FragmentNet, a novel graph-to-sequence model with an adaptive tokenizer that enhances molecular representation learning through chemically valid fragmentation and hierarchical embeddings.
Findings
Outperforms similar-sized models on MoleculeNet tasks
Achieves competitive results with larger state-of-the-art models
Enables interpretability and editing of molecular fragments
Abstract
Molecular property prediction uses molecular structure to infer chemical properties. Chemically interpretable representations that capture meaningful intramolecular interactions enhance the usability and effectiveness of these predictions. However, existing methods often rely on atom-based or rule-based fragment tokenization, which can be chemically suboptimal and lack scalability. We introduce FragmentNet, a graph-to-sequence foundation model with an adaptive, learned tokenizer that decomposes molecular graphs into chemically valid fragments while preserving structural connectivity. FragmentNet integrates VQVAE-GCN for hierarchical fragment embeddings, spatial positional encodings for graph serialization, global molecular descriptors, and a transformer. Pre-trained with Masked Fragment Modeling and fine-tuned on MoleculeNet tasks, FragmentNet outperforms models with similarly scaled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genomics and Phylogenetic Studies · Bioinformatics and Genomic Networks
