Predicting the spectrum of TCR repertoire sharing with a data-driven model of recombination
Yuval Elhanati, Zachary Sethna, Curtis G. Callan Jr., Thierry Mora,, Aleksandra M. Walczak

TL;DR
This paper presents a data-driven model that accurately predicts TCR sequence sharing across individuals by accounting for recombination biases and thymic selection, advancing understanding of immune repertoire publicness.
Contribution
The study introduces a model combining recombination biases and thymic selection to predict TCR sharing, and develops a classifier for public/private sequence determination.
Findings
Model accurately predicts TCR sharing across individuals.
Sharing depends on cohort size, sampling depth, and sequence features.
The PUBLIC classifier performs well even with small cohorts.
Abstract
Despite the extreme diversity of T cell repertoires, many identical T-cell receptor (TCR) sequences are found in a large number of individual mice and humans. These widely-shared sequences, often referred to as `public', have been suggested to be over-represented due to their potential immune functionality or their ease of generation by V(D)J recombination. Here we show that even for large cohorts the observed degree of sharing of TCR sequences between individuals is well predicted by a model accounting for by the known quantitative statistical biases in the generation process, together with a simple model of thymic selection. Whether a sequence is shared by many individuals is predicted to depend on the number of queried individuals and the sampling depth, as well as on the sequence itself, in agreement with the data. We introduce the degree of publicness conditional on the queried…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
