# Learned Conformational Space and Pharmacophore Into Molecular Foundational Model

**Authors:** Lin Wang, Yifan Wu, Hao Luo, Minglong Liang, Yihang Zhou, Cheng Chen, Chris Liu, Jun Zhang, Yang Zhang

PMC · DOI: 10.1002/advs.202513556 · Advanced Science · 2026-01-04

## TL;DR

This paper introduces a new molecular model that integrates conformational and pharmacophore information to improve molecular representation and generation for drug design.

## Contribution

The model uniquely combines conformational space and pharmacophore projections into a pre-trained molecular foundational model.

## Key findings

- The model effectively performs virtual screening, property prediction, and molecular optimization.
- The Ouroboros architecture enables faithful structure reconstruction without prompts or noise.
- The model supports targeted poly-pharmacology design and structure optimization.

## Abstract

The emergence of large‐scale chemical pre‐trained models has significantly advanced our ability to capture complex relationships between molecular structures and their functions. Despite the growing interest in molecular foundational model that provide versatile representations and support molecular optimization for downstream tasks, few efforts have integrated explicit chemical knowledge—such as conformational and pharmacophore information—into pre‐training. Given the highly dynamic nature of small molecules in solution, their conformational changes upon target binding, and the critical role of pharmacophore complementarity, it is essential to incorporate these factors into molecular foundational modeling. Here, we present a molecular foundational model that integrates conformational‐space and pharmacophore‐similarity projections during pre‐training to regularize the representation space. The model adopts an Ouroboros‐like architecture, where the molecular graphs are encoded into 1D representation vectors via a graph neural network and subsequently reconstructed back into SMILES sequences through an autoregressive Transformer module. This dual‐module design establishes a flexible and extensible framework for both representation learning and molecular generation within a unified latent space. Extensive experiments demonstrate that our model effectively addresses a variety of practical chemical challenges, including similarity‐based virtual screening, targeted poly‐pharmacology design, chemical property prediction, and directed molecular optimization.

The Ouroboros model introduces two orthogonal modules within a unified framework that independently learn molecular representations and generate chemical structures. This design enables flexible optimization strategies for each module and faithful structure reconstruction without prompts or noise. By integrating conformational space and pharmacophore projections, Ouroboros enables versatile applications ranging from molecular property prediction to targeted poly‐pharmacology and structure optimization.

## Full-text entities

- **Genes:** GPR166P (G protein-coupled receptor 166, pseudogene) [NCBI Gene 442206] {aka GPCR, PGR9}, TNFRSF10C (TNF receptor superfamily member 10c) [NCBI Gene 8794] {aka CD263, DCR1, DCR1-TNFR, LIT, TRAIL-R3, TRAILR3}, TP53 (tumor protein p53) [NCBI Gene 7157] {aka BCC7, BMFS5, LFS1, P53, TRP53}, CHEK1 (checkpoint kinase 1) [NCBI Gene 1111] {aka CHK1, OZEMA21}, PRMT5 (protein arginine methyltransferase 5) [NCBI Gene 10419] {aka HRMT1L5, HSL7, IBP72, JBP1, SKB1, SKB1Hs}, PIK3CG (phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit gamma) [NCBI Gene 5294] {aka IMD97, PI3CG, PI3K, PI3Kgamma, PIK3, p110gamma}, ARID1A (AT-rich interaction domain 1A) [NCBI Gene 8289] {aka B120, BAF250, BAF250a, BM029, C1orf4, CSS2}, KRAS (KRAS proto-oncogene, GTPase) [NCBI Gene 3845] {aka 'C-K-RAS, C-K-RAS, CFC2, K-RAS2A, K-RAS2B, K-RAS4A}, SMAD4 (SMAD family member 4) [NCBI Gene 4089] {aka DPC4, JIP, MADH4, MYHRS}, PIK3CA (phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha) [NCBI Gene 5290] {aka CCM4, CLAPO, CLOVE, CWS5, HMH, MCAP}, CDKN2A (cyclin dependent kinase inhibitor 2A) [NCBI Gene 1029] {aka ARF, CAI2, CDK4I, CDKN2, CMM2, INK4}, RB1 (RB transcriptional corepressor 1) [NCBI Gene 5925] {aka OSRC, PPP1R130, RB, p105-Rb, p110-RB1, pRb}, AURKA (aurora kinase A) [NCBI Gene 6790] {aka AIK, ARK1, AURA, BTAK, PPP1R47, STK15}, WEE1 (WEE1 G2 checkpoint kinase) [NCBI Gene 7465] {aka WEE1A, WEE1hu}, PIK3CB (phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit beta) [NCBI Gene 5291] {aka P110BETA, PI3K, PI3KBETA, PIK3C1}, PDE4B (phosphodiesterase 4B) [NCBI Gene 5142] {aka DPDE4, PDEIVB}, PLK1 (polo like kinase 1) [NCBI Gene 5347] {aka PLK, STPK13}, PARP12 (poly(ADP-ribose) polymerase family member 12) [NCBI Gene 64761] {aka ARTD12, MST109, MSTP109, ZC3H1, ZC3HDC1}, MTAP (methylthioadenosine phosphorylase) [NCBI Gene 4507] {aka BDMF, DMSFH, DMSMFH, HEL-249, LGMBF, MSAP}, SMARCA4 (SWI/SNF related BAF chromatin remodeling complex subunit ATPase 4) [NCBI Gene 6597] {aka BAF190, BAF190A, BRG1, CSS4, MRD16, OTSC12}, MAP2K1 (mitogen-activated protein kinase kinase 1) [NCBI Gene 5604] {aka CFC3, MAPKK1, MEK1, MEL, MKK1, PRKMK1}
- **Diseases:** irritation (MESH:D001523), eye (MESH:D005134), breast cancer (MESH:D001943), Cancer (MESH:D009369)
- **Chemicals:** 2-oxazolidinone (MESH:D023303), amide (MESH:D000577), Diosmin (MESH:D004145), 1,3,8- triazanaphthalene (-), hydrogen (MESH:D006859), Amoxicillin (MESH:D000658), imatinib (MESH:D000068877), CO2 (MESH:D002245), ATP (MESH:D000255), nitrogen (MESH:D009584), pyrazolo pyridine (MESH:C118531), water (MESH:D014867), aspirin (MESH:D001241), DMSO (MESH:D004121), ionox 330 (MESH:C005054)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** MCF10A — Homo sapiens (Human), Spontaneously immortalized cell line (CVCL_0598), MDA-MB-468 — Homo sapiens (Human), Breast adenocarcinoma, Cancer cell line (CVCL_0419), DUD-E — Rattus norvegicus (Rat), Transformed cell line (CVCL_5U39), ChemBERTa-2 — Homo sapiens (Human), Colon carcinoma, Cancer cell line (CVCL_A628), HEK293 — Homo sapiens (Human), Transformed cell line (CVCL_0045)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13042461/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13042461/full.md

## References

98 references — full list in the complete paper: https://tomesphere.com/paper/PMC13042461/full.md

---
Source: https://tomesphere.com/paper/PMC13042461