# DARKIN: a zero-shot benchmark for phosphosite–dark kinase association using protein language models

**Authors:** Emine Ayşe Sunar, Zeynep Işık, Mert Pekey, Ramazan Gökberk Cinbiş, Oznur Tastan

PMC · DOI: 10.1093/bioinformatics/btaf480 · Bioinformatics · 2025-10-29

## TL;DR

DARKIN is a new benchmark that uses protein language models to assign phosphosites to understudied kinases, helping researchers better understand cellular signaling.

## Contribution

DARKIN introduces a zero-shot benchmark for phosphosite–dark kinase association, enabling systematic evaluation of protein language models.

## Key findings

- ESM, ProtT5-XL, and SaProt showed the best performance in phosphosite–dark kinase classification.
- DARKIN's design respects the zero-shot nature of the task with stratified training and test folds.
- The benchmark supports deeper exploration of under-characterized kinases through a biologically relevant framework.

## Abstract

Protein language models (pLMs) have emerged as powerful tools for capturing the intricate information encoded in protein sequences, facilitating various downstream protein prediction tasks. With numerous pLMs available, there is a critical need for diverse benchmarks to systematically evaluate their performance across biologically relevant tasks. Here, we introduce DARKIN, a zero-shot classification benchmark designed to assign phosphosites to understudied kinases, termed dark kinases. Kinases, which catalyze phosphorylation, are central to cellular signaling pathways. While phosphoproteomics enables the large-scale identification of phosphosites, determining the cognate kinase responsible for the phosphorylation event remains an experimental challenge.

In DARKIN, we prepared training, validation, and test folds that respect the zero-shot nature of this classification problem, incorporating stratification based on kinase groups and sequence similarity. We evaluated multiple pLMs using two zero-shot classifiers: a novel, training-free k-NN-based method, and a bilinear classifier. Our findings indicate that ESM, ProtT5-XL, and SaProt exhibit superior performance on this task. DARKIN provides a challenging benchmark for assessing pLM efficacy and fosters deeper exploration of under-characterized (dark) kinases by offering a biologically relevant test bed.

The DARKIN benchmark data and the scripts for generating additional splits are publicly available at: https://github.com/tastanlab/darkin

## Full-text entities

- **Chemicals:** pLM (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12579546/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12579546/full.md

## References

66 references — full list in the complete paper: https://tomesphere.com/paper/PMC12579546/full.md

---
Source: https://tomesphere.com/paper/PMC12579546