TL;DR
This paper introduces h-MINT, a hierarchical molecular interaction network that uses a novel overlapping fragment tokenization method to improve molecular representations for drug discovery tasks.
Contribution
The work presents a new overlapping fragment tokenization scheme and a hierarchical model that jointly captures atom and fragment interactions, addressing limitations of previous methods.
Findings
Improves binding affinity prediction by 2-4% on PDBBind and LBA datasets.
Enhances virtual screening performance by 1-3% on DUD-E and LIT-PCBA.
Achieves top HTS performance on PubChem assays.
Abstract
Accurate molecular representations are critical for drug discovery, and a central challenge lies in capturing the chemical environment of molecular fragments, as key interactions, such as H-bond and {\pi} stacking, occur only under specific local conditions. Most existing approaches represent molecules as atom-level graphs; however, atom-level representations can hardly express higher-order chemical context (e.g., stereochemistry, lone pairs, conjugation). Fragment-based methods (e.g., principal subgraph, predefined functional groups) fail to preserve essential information such as chirality, aromaticity, and ionic states. This work addresses these limitations from two aspects. (i) OverlapBPE tokenization. We propose a novel data-driven molecule tokenization method. Unlike existing approaches, our method allows overlapping fragments, reflecting the inherently fuzzy boundaries of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
