# From Latent Manifolds to Targeted Molecular Probes: An Interpretable, Kinome-Scale Generative Machine Learning Framework for Family-Based Kinase Ligand Design

**Authors:** Gennady Verkhivker, Ryan Kassab, Keerthi Krishnan

PMC · DOI: 10.3390/biom16020209 · Biomolecules · 2026-01-29

## TL;DR

This paper introduces a new AI framework for designing kinase-targeted drugs by analyzing and transforming molecular scaffolds in a structured, interpretable way.

## Contribution

A modular, chemistry-first generative framework that integrates latent space modeling and interpretable metrics for family-based kinase ligand design.

## Key findings

- SRC-like kinase scaffolds act as a structural hub in latent space, enabling scaffold transformation.
- LCK-derived molecules account for ~40% of high-similarity outputs when converted to SRC-like chemotypes.
- SMILES-based representations fail to recover multi-ring aromatic systems, a key feature of kinase ligands.

## Abstract

Scaffold-aware artificial intelligence (AI) models enable systematic exploration of chemical space conditioned on protein-interacting ligands, yet the representational principles governing their behavior remain poorly understood. The computational representation of structurally complex kinase small molecules remains a formidable challenge due to the high conservation of ATP active site architecture across the kinome and the topological complexity of structural scaffolds in current generative AI frameworks. In this study, we present a diagnostic, modular and chemistry-first generative framework for design of targeted SRC kinase ligands by integrating ChemVAE-based latent space modeling, a chemically interpretable structural similarity metric (Kinase Likelihood Score), Bayesian optimization, and cluster-guided local neighborhood sampling. Using a comprehensive dataset of protein kinase ligands, we examine scaffold topology, latent-space geometry, and model-driven generative trajectories. We show that chemically distinct scaffolds can converge toward overlapping latent representations, revealing intrinsic degeneracy in scaffold encoding, while specific topological motifs function as organizing anchors that constrain generative diversification. The results demonstrate that kinase scaffolds spanning 37 protein kinase families spontaneously organize into a coherent, low-dimensional manifold in latent space, with SRC-like scaffolds acting as a structural “hub” that enables rational scaffold transformation. Our local sampling approach successfully converts scaffolds from other kinase families (notably LCK) into novel SRC-like chemotypes, with LCK-derived molecules accounting for ~40% of high-similarity outputs. However, both generative strategies reveal a critical limitation: SMILES-based representations systematically fail to recover multi-ring aromatic systems—a topological hallmark of kinase chemotypes—despite ring count being a top feature in our structural similarity metric. This “representation gap” demonstrates that no amount of scoring refinement can compensate for a generative engine that cannot access topologically constrained regions. By diagnosing these constraints within a transparent pipeline and reframing scaffold-aware ligand design as a problem of molecular representation our work provides a conceptual framework for interpreting generative model behavior and for guiding the incorporation of structural priors into future molecular AI architectures.

## Linked entities

- **Proteins:** LCK (LCK proto-oncogene, Src family tyrosine kinase)

## Full-text entities

- **Genes:** BRAF (B-Raf proto-oncogene, serine/threonine kinase) [NCBI Gene 673] {aka B-RAF1, B-raf, BRAF-1, BRAF1, NS7, RAFB1}, CLK1 (CDC like kinase 1) [NCBI Gene 1195] {aka CLK, CLK/STY, STY}, FLT3 (fms related receptor tyrosine kinase 3) [NCBI Gene 2322] {aka CD135, FLK-2, FLK2, STK1}, DDR2 (discoidin domain receptor tyrosine kinase 2) [NCBI Gene 4921] {aka DDR2-N, MIG20a, NTRKR3, TKT, TYRO10, WRCN}, MAP3K9 (mitogen-activated protein kinase kinase kinase 9) [NCBI Gene 4293] {aka MEKK9, MLK1, PRKE1}, DDR1 (discoidin domain receptor tyrosine kinase 1) [NCBI Gene 780] {aka CAK, CD167, DDR, EDDR1, HGK2, MCK10}, ATP8A2 (ATPase phospholipid transporting 8A2) [NCBI Gene 51761] {aka ATP, ATPIB, CAMRQ4, IB, ML-1}, LRRK1 (leucine rich repeat kinase 1) [NCBI Gene 79705] {aka OSMD, RIPK6, Roco1}, MAPK14 (mitogen-activated protein kinase 14) [NCBI Gene 1432] {aka CSBP, CSBP1, CSBP2, CSPB1, EXIP, Mxi2}, PDGFRA (platelet derived growth factor receptor alpha) [NCBI Gene 5156] {aka CD140A, PDGFR-2, PDGFR2}, ANOS1 (anosmin 1) [NCBI Gene 3730] {aka ADMLX, HH1, HHA, KAL, KAL1, KALIG-1}, MAP4K1 (mitogen-activated protein kinase kinase kinase kinase 1) [NCBI Gene 11184] {aka HPK1}, RAF1 (Raf-1 proto-oncogene, serine/threonine kinase) [NCBI Gene 5894] {aka CMD1NN, CRAF, NS5, Raf-1, c-Raf}, EGFR (epidermal growth factor receptor) [NCBI Gene 1956] {aka ERBB, ERBB1, ERRP, HER1, NISBD2, NNCIS}, LRRK2 (leucine rich repeat kinase 2) [NCBI Gene 120892] {aka AURA17, DARDARIN, PARK8, RIPK7, ROCO2}, ERBB4 (erb-b2 receptor tyrosine kinase 4) [NCBI Gene 2066] {aka ALS19, HER4, p180erbB4}, CSF1R (colony stimulating factor 1 receptor) [NCBI Gene 1436] {aka BANDDOS, C-FMS, CD115, CSF-1R, CSFR, FIM2}, SGK1 (serum/glucocorticoid regulated kinase 1) [NCBI Gene 6446] {aka SGK}, ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064] {aka CD340, HER-2, HER-2/neu, HER2, MLN 19, MLN-19}, SLTM (SAFB like transcription modulator) [NCBI Gene 79811] {aka Met}, ZHX2 (zinc fingers and homeoboxes 2) [NCBI Gene 22882] {aka AFR1, RAF}, MAP3K13 (mitogen-activated protein kinase kinase kinase 13) [NCBI Gene 9175] {aka LZK, MEKK13, MLK}, MAP3K11 (mitogen-activated protein kinase kinase kinase 11) [NCBI Gene 4296] {aka MEKK11, MLK-3, MLK3, PTK1, SPRK}, INSR (insulin receptor) [NCBI Gene 3643] {aka CD220, HHF5}, STK24 (serine/threonine kinase 24) [NCBI Gene 8428] {aka HEL-S-95, MST3, MST3B, STE20, STK3}, LCK (LCK proto-oncogene, Src family tyrosine kinase) [NCBI Gene 3932] {aka IMD22, LSK, YT16, p56lck, pp58lck}, PDGFRB (platelet derived growth factor receptor beta) [NCBI Gene 5159] {aka CD140B, IBGC4, IMF1, JTK12, KOGS, OPDKD}, KIT (KIT proto-oncogene, receptor tyrosine kinase) [NCBI Gene 3815] {aka C-Kit, CD117, MASTC, PBT, SCFR}, TLK1 (tousled like kinase 1) [NCBI Gene 9874] {aka PKU-beta}, MAP3K10 (mitogen-activated protein kinase kinase kinase 10) [NCBI Gene 4294] {aka MEKK10, MLK2, MST}, FYN (FYN proto-oncogene, Src family tyrosine kinase) [NCBI Gene 2534] {aka SLK, SYN, p59-FYN}, ARAF (A-Raf proto-oncogene, serine/threonine kinase) [NCBI Gene 369] {aka A-RAF, ARAF1, PKS2, RAFA1}, ABL1 (ABL proto-oncogene 1, non-receptor tyrosine kinase) [NCBI Gene 25] {aka ABL, BCR-ABL, CHDSKM, JTK7, bcr/abl, c-ABL}, TLK2 (tousled like kinase 2) [NCBI Gene 11011] {aka HsHPK, MRD57, PKU-ALPHA}, ROS1 (ROS proto-oncogene 1, receptor tyrosine kinase) [NCBI Gene 6098] {aka MCF3, ROS, c-ros-1}, IGF1R (insulin like growth factor 1 receptor) [NCBI Gene 3480] {aka CD221, IGFIR, IGFR, JTK13}, MAPK10 (mitogen-activated protein kinase 10) [NCBI Gene 5602] {aka JNK3, JNK3A, PRKM10, SAPK1b, p493F12, p54bSAPK}, SRC (SRC proto-oncogene, non-receptor tyrosine kinase) [NCBI Gene 6714] {aka ASV, SRC1, THC6, c-SRC, p60-Src}, KDR (kinase insert domain receptor) [NCBI Gene 3791] {aka CD309, FLK1, VEGFR, VEGFR2}, NTRK1 (neurotrophic receptor tyrosine kinase 1) [NCBI Gene 4914] {aka MTC, TRK, TRK1, TRKA, Trk-A, p140-TrkA}, LTK (leukocyte receptor tyrosine kinase) [NCBI Gene 4058] {aka TYK1}, RPS6KA2 (ribosomal protein S6 kinase A2) [NCBI Gene 6196] {aka HU-2, MAPKAPK1C, RSK, RSK3, S6K-alpha, S6K-alpha2}, ALK (ALK receptor tyrosine kinase) [NCBI Gene 238] {aka ALK1, CD246, NBLST3}, RYK (receptor like tyrosine kinase) [NCBI Gene 6259] {aka D3S3195, JTK5, JTK5A, RYK1}, ABL2 (ABL proto-oncogene 2, non-receptor tyrosine kinase) [NCBI Gene 27] {aka ABLL, ARG}, MST1R (macrophage stimulating 1 receptor) [NCBI Gene 4486] {aka CD136, CDw136, NPCA3, PTK8, RON, SEA}
- **Diseases:** injury to (MESH:D014947)
- **Chemicals:** Hydrogen (MESH:D006859), adenine (MESH:D000225), pyrimidine (MESH:C030986), ATP (MESH:D000255), amide (MESH:D000577), quinazoline (MESH:D011799), Dasatinib (MESH:D000069439), pyrrolopyrimidine (MESH:C527741), Cl (MESH:D002713), C (MESH:D002244), ChemVAE (-), Ponatinib (MESH:C545373)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12938821/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12938821/full.md

## References

111 references — full list in the complete paper: https://tomesphere.com/paper/PMC12938821/full.md

---
Source: https://tomesphere.com/paper/PMC12938821