Layer Probing Improves Kinase Functional Prediction with Protein Language Models
Ajit Kumar, IndraPrakash Jha

TL;DR
This study shows that analyzing all layers of protein language models, especially mid-to-late layers, enhances kinase function prediction accuracy compared to using only final-layer embeddings.
Contribution
The paper systematically evaluates all layers of ESM-2, revealing that intermediate layers contain valuable biological signals for kinase function prediction, improving upon previous methods.
Findings
Mid-to-late layers outperform final layer in unsupervised clustering.
Supervised accuracy improves to 75.7% with layer selection.
Reproducible benchmarking pipeline enhances reliability.
Abstract
Protein language models (PLMs) have transformed sequence-based protein analysis, yet most applications rely only on final-layer embeddings, which may overlook biologically meaningful information encoded in earlier layers. We systematically evaluate all 33 layers of ESM-2 for kinase functional prediction using both unsupervised clustering and supervised classification. We show that mid-to-late transformer layers (layers 20-33) outperform the final layer by 32 percent in unsupervised Adjusted Rand Index and improve homology-aware supervised accuracy to 75.7 percent. Domain-level extraction, calibrated probability estimates, and a reproducible benchmarking pipeline further strengthen reliability. Our results demonstrate that transformer depth contains functionally distinct biological signals and that principled layer selection significantly improves kinase function prediction.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Biomedical Text Mining and Ontologies · Bioinformatics and Genomic Networks
