ImmSET: Sequence-Based Predictor of TCR-pMHC Specificity at Scale
Marco Garcia Noceda, Matthew T Noakes, Andrew FigPope, Daniel E Mattox, Bryan Howie, Harlan Robins

TL;DR
ImmSET is a scalable, sequence-based transformer model that predicts TCR-pMHC specificity, outperforming existing methods and demonstrating robustness and generalizability in immune interaction prediction.
Contribution
The paper introduces ImmSET, a novel transformer architecture for modeling multi-sequence interactions, with improved performance and robustness over prior approaches in TCR-pMHC specificity prediction.
Findings
ImmSET outperforms AlphaFold2 and AlphaFold3 pipelines with sufficient training data.
Performance of ImmSET scales positively with training data volume.
Prior sequence-based methods had inflated performance due to a specific failure mode.
Abstract
T cells are a critical component of the adaptive immune system, playing a role in infectious disease, autoimmunity, and cancer. T cell function is mediated by the T cell receptor (TCR) protein, a highly diverse receptor targeting specific peptides presented by the major histocompatibility complex (pMHCs). Predicting the specificity of TCRs for their cognate pMHCs is central to understanding adaptive immunity and enabling personalized therapies. However, accurate prediction of this protein-protein interaction remains challenging due to the extreme diversity of both TCRs and pMHCs. Here, we present ImmSET (Immune Synapse Encoding Transformer), a novel sequence-based architecture designed to model interactions among sets of variable-length biological sequences. We train this model across a range of dataset sizes and compositions and study the resulting models' generalization to pMHC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
