tcrLM: a lightweight protein language model for predicting T cell receptor and epitope binding specificity
Xing Fang, Chenpeng Yu, Shiye Tian, Hui Liu

TL;DR
tcrLM is a novel lightweight protein language model trained on a vast TCR sequence dataset, improving prediction accuracy of TCR-antigen binding and offering insights into immune response mechanisms for immunotherapy.
Contribution
The paper introduces tcrLM, a lightweight masked language model trained on over 100 million TCR sequences, enhancing prediction of TCR-antigen binding and capturing biochemical properties.
Findings
tcrLM outperforms existing prediction methods.
It captures amino acid biochemical properties.
Predicts immunotherapy responses in melanoma.
Abstract
The anti-cancer immune response relies on the bindings between T-cell receptors (TCRs) and antigens, which elicits adaptive immunity to eliminate tumor cells. This ability of the immune system to respond to novel various neoantigens arises from the immense diversity of TCR repository. However, TCR diversity poses a significant challenge on accurately predicting antigen-TCR bindings. In this study, we introduce a lightweight masked language model, termed tcrLM, to address this challenge. Our approach involves randomly masking segments of TCR sequences and training tcrLM to infer the masked segments, thereby enabling the extraction of expressive features from TCR sequences. To further enhance robustness, we incorporate virtual adversarial training into tcrLM. We construct the largest TCR CDR3 sequence set with more than 100 million distinct sequences, and pretrain tcrLM on these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsvaccines and immunoinformatics approaches · Monoclonal and Polyclonal Antibodies Research · Chemical Synthesis and Analysis
MethodsSparse Evolutionary Training
