Clustering Bioactive Molecules in 3D Chemical Space with Unsupervised Deep Learning
Chu Qin, Ying Tan, Shang Ying Chen, Xian Zeng, Xingxing Qi, Tian Jin,, Huan Shi, Yiwei Wan, Yu Chen, Jingfeng Li, Weidong He, Yali Wang, Peng Zhang,, Feng Zhu, Hongping Zhao, Yuyang Jiang, Yuzong Chen

TL;DR
This paper presents a deep autoencoder-based method for unsupervised clustering of 1.39 million bioactive molecules in 3D chemical space, revealing meaningful sub-structural features and bioactivity patterns.
Contribution
It introduces a novel deep learning approach for large-scale bioactive molecule clustering that uncovers structural and activity-based subgroups beyond traditional similarity methods.
Findings
Successfully clustered 1.39 million molecules into meaningful band-clusters
Revealed sub-structural features associated with bioactivity classes
Demonstrated applicability to big data clustering tasks
Abstract
Unsupervised clustering has broad applications in data stratification, pattern investigation and new discovery beyond existing knowledge. In particular, clustering of bioactive molecules facilitates chemical space mapping, structure-activity studies, and drug discovery. These tasks, conventionally conducted by similarity-based methods, are complicated by data complexity and diversity. We ex-plored the superior learning capability of deep autoencoders for unsupervised clustering of 1.39 mil-lion bioactive molecules into band-clusters in a 3-dimensional latent chemical space. These band-clusters, displayed by a space-navigation simulation software, band molecules of selected bioactivity classes into individual band-clusters possessing unique sets of common sub-structural features beyond structural similarity. These sub-structural features form the frameworks of the literature-reported…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Metabolomics and Mass Spectrometry Studies
