Learning Invariant Molecular Representation in Latent Discrete Space
Xiang Zhuang, Qiang Zhang, Keyan Ding, Yatao Bian, Xiao Wang, Jingsong, Lv, Hongyang Chen, Huajun Chen

TL;DR
This paper introduces a novel molecular representation learning framework that enhances out-of-distribution generalization by identifying invariant features in a latent space, using a residual vector quantization and self-supervised learning.
Contribution
It proposes a new 'first-encoding-then-separation' approach with residual vector quantization and a task-agnostic self-supervised objective for robust molecular representations.
Findings
Outperforms state-of-the-art methods on 18 molecular datasets.
Achieves stronger generalization under distribution shifts.
Effective across various tasks like regression and multi-label classification.
Abstract
Molecular representation learning lays the foundation for drug discovery. However, existing methods suffer from poor out-of-distribution (OOD) generalization, particularly when data for training and testing originate from different environments. To address this issue, we propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts. Specifically, we propose a strategy called ``first-encoding-then-separation'' to identify invariant molecule features in the latent space, which deviates from conventional practices. Prior to the separation step, we introduce a residual vector quantization module that mitigates the over-fitting to training data distributions while preserving the expressivity of encoders. Furthermore, we design a task-agnostic self-supervised learning objective to encourage precise invariance identification,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Machine Learning in Bioinformatics
