The chemical space of terpenes: insights from data science and AI
Morteza Hosseini, David M. Pereira

TL;DR
This study employs data science and AI techniques to systematically analyze and classify a large dataset of nearly 60,000 terpenes, revealing their chemical diversity and demonstrating effective clustering and classification methods.
Contribution
It introduces a comprehensive data-driven framework for characterizing terpene chemical space and compares various clustering and classification algorithms for this purpose.
Findings
High accuracy in subclass classification (>0.9 metrics)
Effective clustering with PCA, t-SNE, UMAP, and other methods
Systematic analysis of chemical and physical properties of terpenes
Abstract
Terpenes are a widespread class of natural products with significant chemical and biological diversity and many of these molecules have already made their way into medicines. Given the thousands of molecules already described, the full characterization of this chemical space can be a challenging task when relying in classical approaches. In this work we employ a data science-based approach to identify, compile and characterize the diversity of terpenes currently known in a systematic way. We worked with a natural product database, COCONUT, from which we extracted information for nearly 60000 terpenes. For these molecules, we conducted a subclass-by-subclass analysis in which we highlight several chemical and physical properties relevant to several fields, such as natural products chemistry, medicinal chemistry and drug discovery, among others. We were also interested in assessing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Plant biochemistry and biosynthesis · Molecular spectroscopy and chirality
MethodsPrincipal Components Analysis
