# All That Glitters Is Not Gold: Importance of Rigorous Evaluation of Proteochemometric Models

**Authors:** Polina Avdiunina, Shamieraah Jamal, Filipp Gusev, Olexandr Isayev

PMC · DOI: 10.1021/acs.jcim.5c00395 · Journal of Chemical Information and Modeling · 2025-09-03

## TL;DR

This paper emphasizes the need for rigorous evaluation of proteochemometric models in drug discovery to ensure reliable and generalizable predictions.

## Contribution

The study highlights the importance of data splitting and class balance in PCM performance and advocates for stricter evaluation standards.

## Key findings

- Data splitting and class imbalances are the most critical factors affecting PCM performance.
- Protein embeddings contribute minimally to PCM efficacy according to permutation testing.
- Stricter evaluation standards are needed to improve model generalizability and benchmarking practices.

## Abstract

Proteochemometric
models (PCMs) are used in computational drug
discovery to employ both protein and ligand representations jointly
for bioactivity prediction. While machine learning (ML) and deep learning
(DL) have come to dominate PCMs, often serving as a basis for scoring
functions, rigorous evaluation standards have not always been consistently
applied. In this study, using kinase-ligand bioactivity prediction
as a model system, we highlight the critical roles of data set curation,
permutation testing, class imbalances, and various data splitting
strategies for mitigating plausible data leakage and embedding quality
in determining model performance. Our findings indicate that data
splitting and class imbalances are the most critical factors affecting
PCM performance, emphasizing the challenges in the generalizing ability
of ML/DL-PCMs. We evaluated various protein–ligand descriptors
and embeddings, including those augmented with multiple sequence alignment
information. However, permutation testing consistently demonstrated
that protein embeddings contributed minimally to PCM efficacy. This
study advocates for the adoption of stringent evaluation standards
to enhance the generalizability of models to out-of-distribution data
and improve benchmarking practices.

## Full-text entities

- **Genes:** ABL1 (ABL proto-oncogene 1, non-receptor tyrosine kinase) [NCBI Gene 25] {aka ABL, BCR-ABL, CHDSKM, JTK7, bcr/abl, c-ABL}, TXK (TXK tyrosine kinase) [NCBI Gene 7294] {aka BTKL, PSCTK5, PTK4, RLK, TKL}, ALK (ALK receptor tyrosine kinase) [NCBI Gene 238] {aka ALK1, CD246, NBLST3}
- **Diseases:** leukemia (MESH:D007938), nonsmall cell lung cancer (MESH:D002289), breast cancer (MESH:D001943), neural and metabolic disorders (MESH:D008659), DL (MESH:D007859), renal cell carcinoma (MESH:D002292)
- **Chemicals:** DUD- (-), amino acid (MESH:D000596), hydrogen (MESH:D006859)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12529762/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12529762/full.md

## References

60 references — full list in the complete paper: https://tomesphere.com/paper/PMC12529762/full.md

---
Source: https://tomesphere.com/paper/PMC12529762