Statistical inference of the generation probability of T-cell receptors from sequence repertoires
Anand Murugan, Thierry Mora, Aleksandra M. Walczak, Curtis G. Callan, Jr

TL;DR
This paper develops a statistical inference method to quantify the probability of generating specific T-cell receptor sequences, revealing insights into immune diversity and the underlying biochemical mechanisms across individuals.
Contribution
It introduces a maximum likelihood inference approach to deduce the biochemical event distributions of T-cell receptor generation from sequence data, separating molecular mechanisms from selection effects.
Findings
The generative process is consistent across individuals.
The inferred distribution predicts sequence generation probabilities.
Shared sequences between individuals can be explained by generation probabilities.
Abstract
Stochastic rearrangement of germline DNA by VDJ recombination is at the origin of immune system diversity. This process is implemented via a series of stochastic molecular events involving gene choices and random nucleotide insertions between, and deletions from, genes. We use large sequence repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta chains to infer the statistical properties of these basic biochemical events. Since any given CDR3 sequence can be produced in multiple ways, the probability distribution of hidden recombination events cannot be inferred directly from the observed sequences; we therefore develop a maximum likelihood inference method to achieve this end. To separate the properties of the molecular rearrangement mechanism from the effects of selection, we focus on non-productive CDR3 sequences in T-cell DNA. We infer the joint distribution of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
