Layer Probing Improves Kinase Functional Prediction with Protein Language Models

Ajit Kumar; IndraPrakash Jha

arXiv:2512.00376·q-bio.QM·December 2, 2025

Layer Probing Improves Kinase Functional Prediction with Protein Language Models

Ajit Kumar, IndraPrakash Jha

PDF

Open Access

TL;DR

This study shows that analyzing all layers of protein language models, especially mid-to-late layers, enhances kinase function prediction accuracy compared to using only final-layer embeddings.

Contribution

The paper systematically evaluates all layers of ESM-2, revealing that intermediate layers contain valuable biological signals for kinase function prediction, improving upon previous methods.

Findings

01

Mid-to-late layers outperform final layer in unsupervised clustering.

02

Supervised accuracy improves to 75.7% with layer selection.

03

Reproducible benchmarking pipeline enhances reliability.

Abstract

Protein language models (PLMs) have transformed sequence-based protein analysis, yet most applications rely only on final-layer embeddings, which may overlook biologically meaningful information encoded in earlier layers. We systematically evaluate all 33 layers of ESM-2 for kinase functional prediction using both unsupervised clustering and supervised classification. We show that mid-to-late transformer layers (layers 20-33) outperform the final layer by 32 percent in unsupervised Adjusted Rand Index and improve homology-aware supervised accuracy to 75.7 percent. Domain-level extraction, calibrated probability estimates, and a reproducible benchmarking pipeline further strengthen reliability. Our results demonstrate that transformer depth contains functionally distinct biological signals and that principled layer selection significantly improves kinase function prediction.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Bioinformatics · Biomedical Text Mining and Ontologies · Bioinformatics and Genomic Networks