ChemCLIP: Bridging Organic and Inorganic Anticancer Compounds Through Contrastive Learning
Mohamad Koohi-Moghadam, Hongzhe Sun, Hongyan Li, Kyongtae Tyler Bae

TL;DR
ChemCLIP introduces a contrastive learning framework that unifies organic and inorganic anticancer compounds into a shared representation space, enhancing cross-domain knowledge transfer and aiding drug discovery.
Contribution
It is the first to apply contrastive learning to bridge organic and inorganic anticancer compounds, creating a shared embedding space based on biological activity.
Findings
Morgan fingerprints achieved an average alignment ratio of 0.899.
Downstream classification AUCs were 0.859 for inorganic and 0.817 for organic compounds.
Contrastive learning effectively unifies chemically distinct compounds based on activity.
Abstract
The discovery of anticancer therapeutics has traditionally treated organic small molecules and metal-based coordination complexes as separate chemical domains, limiting knowledge transfer despite their shared biological objectives. This disparity is particularly pronounced in available data, with extensive screening databases for organic compounds compared to only a few thousand characterized metal complexes. Here, we introduce ChemCLIP, a dual-encoder contrastive learning framework that bridges this organic-inorganic divide by learning unified representations based on shared anticancer activities rather than structural similarity. We compiled complementary datasets comprising 44,854 unique organic compounds and 5,164 unique metal complexes, standardized across 60 cancer cell lines. By training parallel encoders with activity-aware hard negative mining, we mapped structurally distinct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
