In Silico Functional Profiling of Engineered Small Molecules: A Machine Learning Approach Leveraging PubChem Identifiers (CID_SID ML model)
Mariya L. Ivanova, Michael Nicholls, Nicola Russo, Gueorgui Mihaylov, Konstantin Nikolic

TL;DR
This paper presents a machine learning framework using PubChem identifiers for rapid, scalable in silico profiling of small molecules, demonstrating comparable performance to traditional methods with significantly reduced computational time.
Contribution
The study introduces a novel ML approach leveraging PubChem IDs that eliminates the need for molecular descriptor calculation, enabling faster and scalable drug profiling.
Findings
CID_SID ML model is significantly faster (3.3s) than descriptor-based models (~109s).
The model achieves high accuracy (83.52%) and precision (89.62%) across diverse bioassays.
Performance varies across methods, with no single approach being universally superior.
Abstract
The article introduces a concept for a time- and cost-effective methodological framework leveraging machine learning (ML) models for both early-stage drug development and clinical trial support. The rationale for this approach is the inherent scalability and speed enabled by using pre-calculated data embedded in existing PubChem identifiers (CID and SID), thereby eliminating the computationally intensive step of on-the-fly molecular descriptor generation. The approach was effectively demonstrated across four diverse bioassays: antagonists of the human D3 dopamine receptor, Rab9 promoter activators, small-molecule inhibitors of CHOP, and antagonists of the human M1 muscarinic receptor. A comparison, based on Matthews correlation coefficient (MCC), was conducted between the CID_SID ML model, the MORGAN2-based ML model, and the RDKit-transformed SMILES model for these four case studies,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Metabolomics and Mass Spectrometry Studies
