What cleaves? Is proteasomal cleavage prediction reaching a ceiling?
Ingo Ziegler, Bolei Ma, Ercong Nie, Bernd Bischl, David R\"ugamer,, Benjamin Schubert, Emilio Dorigatti

TL;DR
This paper benchmarks recent deep learning models for proteasomal cleavage prediction, revealing that model complexity yields limited improvements due to biological noise and inherent unpredictability.
Contribution
It provides a comprehensive comparison of modern deep learning approaches on a new cleavage dataset, highlighting the limits imposed by biological noise rather than model choice.
Findings
Models reached about 88.5% AUC on C-terminal cleavage prediction.
Increasing model complexity offers limited performance gains.
Biological noise and complexity are the main limiting factors.
Abstract
Epitope vaccines are a promising direction to enable precision treatment for cancer, autoimmune diseases, and allergies. Effectively designing such vaccines requires accurate prediction of proteasomal cleavage in order to ensure that the epitopes in the vaccine are presented to T cells by the major histocompatibility complex (MHC). While direct identification of proteasomal cleavage \emph{in vitro} is cumbersome and low throughput, it is possible to implicitly infer cleavage events from the termini of MHC-presented epitopes, which can be detected in large amounts thanks to recent advances in high-throughput MHC ligandomics. Inferring cleavage events in such a way provides an inherently noisy signal which can be tackled with new developments in the field of deep learning that supposedly make it possible to learn predictors from noisy labels. Inspired by such innovations, we sought to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsvaccines and immunoinformatics approaches · Immunotherapy and Immune Responses · Machine Learning in Bioinformatics
