How heterogeneous thymic output and homeostatic proliferation shape naive T cell receptor clone abundance distributions
Renaud Dessalles, Yunbei Pan, Mingtao Xia, Davide Maestrini, Maria R., D'Orsogna, Tom Chou

TL;DR
This study models how heterogeneity in thymic output and T cell proliferation influences naive T cell receptor clone abundance distributions, revealing that proliferation heterogeneity is key to matching observed data.
Contribution
It introduces a mean-field birth-death-immigration model incorporating TCR-dependent heterogeneity, highlighting proliferation variability as crucial for clone abundance patterns.
Findings
Heterogeneity in proliferation rates significantly impacts clone abundance distributions.
Immigration rate heterogeneity has minimal effect on predicted clone counts.
Model constraints align with experimental clone abundance data.
Abstract
The set of T cells that express the same T cell receptor (TCR) sequence represents a T cell clone. The number of different naive T cell clones in an organism reflects the number of different T cell receptors (TCRs) arising from recombination of the V(D)J gene segments during T cell development in the thymus. TCR diversity and more specifically, the clone abundance distribution is an important factor in immune function. Specific recombination patterns occur more frequently than others while subsequent interactions between TCRs and self-antigens are known to trigger proliferation and sustain naive T cell survival. These processes are TCR-dependent, leading to clone-dependent thymic export and naive T cell proliferation rates. Using a mean-field approximation to the solution of a regulated birth-death-immigration model and a modification arising from sampling, we systematically quantify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsT-cell and B-cell Immunology · Immune Cell Function and Interaction · Monoclonal and Polyclonal Antibodies Research
How heterogeneous thymic output and homeostatic proliferation shape naive T cell receptor clone abundance distributions
Renaud Dessalles1, Yunbei Pan1,2, Mingtao Xia3, Davide Maestrini1 Maria R. D’Orsogna1,2, Tom Chou1,3,
1 Dept. of Computational Medicine, UCLA, Los Angeles, CA 90095-1766
2 Dept. of Mathematics, CalState-Northridge, Los Angeles, CA 91330
3 Dept. of Mathematics, UCLA, Los Angeles, CA 90095-1555 E-mail: [email protected]
Abstract
The set of T cells that express the same T cell receptor (TCR) sequence represents a T cell clone. The number of different naive T cell clones in an organism reflects the number of different T cell receptors (TCRs) arising from recombination of the V(D)J gene segments during T cell development in the thymus. TCR diversity and more specifically, the clone abundance distribution is an important factor in immune function. Specific recombination patterns occur more frequently than others while subsequent interactions between TCRs and self-antigens are known to trigger proliferation and sustain naive T cell survival. These processes are TCR-dependent, leading to clone-dependent thymic export and naive T cell proliferation rates. Using a mean-field approximation to the solution of a regulated birth-death-immigration model and a modification arising from sampling, we systematically quantify how TCR-dependent heterogeneities in immigration and proliferation rates affect the shape of clone abundance distributions (the number of different clones that are represented by a specific number of cells, or “clone counts”). By comparing predicted clone counts derived from our heterogeneous birth-death-immigration model with experimentally sampled clone abundances, we show that although heterogeneity in immigration rates causes very little change to predicted clone-counts, significant heterogeneity in proliferation rates is necessary to generate the observed abundances with reasonable physiological parameter values. Our analysis provides constraints among physiological parameters that are necessary to yield predictions that qualitatively match the data. Assumptions of the model and potentially other important mechanistic factors are discussed.
Author Summary
It is not known how naive T cell receptor (TCR)-dependent thymic output, proliferation, and death rates control the overall diversity of T cells in the body. Using a reasonable range of physiological parameters within a birth-death-immigration (BDI) model, we quantitatively show how heterogeneity influences the expected sampled clone abundances. We find that heterogeneity in clone-specific immigration rate only weakly affects sampled clone counts while modest heterogeneity in proliferation rates can dramatically affect clone counts. Fitting our model to recently published data, we impose quantitative constraints on biologically meaningful rate parameters in the model. These findings are consistent with the known proliferation-driven maintenance of T cell population in humans.
Introduction
Naive T cells play a fundamental role in the immune system’s response to pathogens, tumors, and other infectious agents. These cells are produced in the thymus, circulate through the blood, and migrate to the lymph nodes where they may be presented with different antigen proteins from various pathogens. Naive T cells mature in the thymus where the so-called V, D, and J segments of genes that code T cell receptors undergo rearrangement. Most T cell receptors (TCRs) are comprised of an alpha chain and a beta chain that are formed after VJ segment and VDJ segment recombination, respectively. The number of possible TCR gene sequences is extremely large, but while recombination is a nearly random process, not all TCRs are formed with the same probability. Before export to the periphery, T cells undergo a selection process, during which T cells with TCRs that react to self proteins are eliminated.
The unique receptors expressed on the cell surface of circulating TCRs enable them to recognize specific antigens; well-known examples include the naive forms of helper T cells (CD4+) and cytotoxic T cells (CD8+). The set of naive T cells that express the same TCR are said to belong to the same T cell clone. Upon encountering the antigens that activate their TCRs, naive T cells turn into effector cells that assist in eliminating infected cells. Effector cells die after pathogen clearance, but some develop into memory T cells. Because of the large number of unknown pathogens, TCR clonal diversity is a key factor for mounting an effective immune response. Recent studies also reveal that human TCR clonal diversity is implicated in healthy aging, neonatal immunity, vaccination response and T cell reconstitution following haematopoietic stem cell transplantation [1, 2]. Despite the central role of the naive T cell pool in host defense, and broadly speaking in health and disease, TCR diversity is difficult to quantify. For example, the human body hosts a large repertoire of T cell clones, however the actual distribution of clone sizes is not precisely known [3]. Only recently have experimental and theoretical efforts been devoted to understanding the mechanistic origins of TCR diversity [4, 5, 6, 7, 8, 9]. The goal of this work is to formulate a realistic mathematical model that includes heterogeneous naive T cell generation and reproduction rates and that we will use to describe recent experimental results.
A well-established way to describe the T cell repertoire is by determining the clone abundance distribution or “clone count” (for ) that measures the number of distinct clones represented by exactly T cells: , where is the discrete number of T cells carrying TCR and the indicator function if and [math] otherwise. This distribution captures the entire pattern of the clonal populations. Several summary indices for T cell diversity such as Shannon’s entropy, Simpson’s index, or the whole population richness can be deduced from the distribution [10]. Note that counts only the number of clones of a specific population and does not carry any TCR sequence or identity information.
Complete clone counts and the total number of circulating naive T cells are difficult to measure in humans. Nonetheless, high-throughput DNA sequencing on samples of peripheral blood containing T cells [11, 12, 13, 14] have provided some insight into TCR diversity. A commonly observed feature is that clone counts often exhibit a power-law distribution in the clone abundance , albeit for approximately one or at most two decades (see Fig. 1).
Several models have been developed to explain certain features of observed clone counts [3, 15, 16, 17, 4], including the apparent power-law behavior. One proposal is that T cells in different clones have TCRs that have different affinities for self-ligands that are necessary for peripheral proliferation [4, 5, 6], leading to clone specific replication rates. An alternative hypothesis [7] is that specific TCR sequences are more likely to arise in the V(D)J recombination process in the thymus [18] leading to a higher probability that these TCRs are produced. De Greef et al. [7] estimated the probability of production of a given TCR sequence by using the Inference and Generation of Repertoires (IGoR) simulation tool that quantitatively characterizes the statistics of receptor generation from both cDNA and gDNA data [18]. However, the data indicate that power laws, which represent a simple summary of sampled clone counts, arise only across one or two decades of clone sizes, as shown in Fig. 1. Moreover, the above studies have not systematically incorporated and compared heterogeneity in both immigration and replication rates, sampling, or fitting to measured TCR clone abundance distributions.
In this paper, we analyze the effects of heterogeneity and sampling within a mean-field model based on a stochastic multiclone birth-death-immigration (BDI) process. The model allows for TCR-sequence dependent replication and immigration rates. Our model is derived from an underlying continuous-time Markovian birth-death-immigration (BDI) process [10] described by: (i) immigration representing the arrival of new clones from the thymus; (ii) birth during homeostatic proliferation of naive T cells that yield newborn naive T cells with the same TCR as their parent; and (iii) death representing cell apoptosis. We also include a regulation, or “carrying capacity,” mechanism through a total population-dependent death rate which may represent the global competition for cytokines, such as Interleukine-7 [19, 20, 21, 22, 23], needed for naive T cell survival and homeostasis [24, 25]. This interaction will ensure a homeostatic finite naive T cell population and will be assumed to be clone-independent since these cytokine signals are TCR-independent [21].
We derive analytical results of our heterogeneous BDI model that are applicable on the scale of the entire organism. These results are then extended to predict the expected clone counts in a small subsample of the population. Finally, this result is averaged over distributions of TCR immigration (thymic output) and homeostatic proliferation rates. Heterogeneity in TCR production rates are extracted from new computational tools: Inference and Generation of Repertoires (IGoR) [18] and Optimized Likelihood estimate of immunoGlobulin Amino-acid sequences (OLGA) [26]. Since there are no equivalent tools that measures proliferation rates, we will assume simple functional forms for the distribution of homeostatic proliferation rates.
Our calculations provide insight into how parameters describing the shape of the distribution of immigration and proliferation rates affect the shape of the expected clone counts. To compare with experimental measurements, we also quantify the random sampling process that describes actual measurements derived from blood draws. Our model allows us to estimate the values of relevant physiological parameters and reveals that the shape of the sampled expected clone counts are much less sensitive to TCR-dependent thymic output than to TCR-dependent proliferation. However, we show how the width of a simple uniform proliferation rate distribution can inform the observed .
Materials and Methods
To understand the observed clone counts, we focus on the clone abundance distribution associated only with naive T cells, the first type of cells produced by the thymus that have not yet been activated by any antigen. Antigen-mediated activation initiates a one-way irreversible cascade of differentiation into effector and memory T cells that we can subsume into a death rate. Thus, we limit our analysis to birth, death, and immigration within the naive T cell compartment. Here, we first present the mathematical framework of the BDI process to provide an initial qualitative understanding for clone counts.
Non-neutral Birth-Death-Immigration model
The multiclone BDI process is depicted in Fig. 2.
We define to be the theoretical number of possible functional naive T cell receptor clones that can be generated by V(D)J recombination in the thymus which is estimated to be [6, 26]. As we will later show, results of our model will not be depend the explicit value of as long as . Due to naive T cell death or removal from the sampling-accessible pool, not all possible clone types will be presented in the organism, so we denote the number of clones actually present in the body (or “richness”) by , where estimates of range from in mice and humans [29, 30, 1, 6, 32].
Although naive T cells are difficult to distinguish from the entire T cell population, the total number of naive T cells (across all clones present) in human has been estimated to be about . However, circulating naive T cells number perhaps but can exchange, at different time scales, with those that reside in peripheral tissue, which may carry their own proliferation and death rates. The effective pool that is ultimately sampled is thus difficult to estimate, but measurements show the theoretical number of different clones the total number of naive T cells the total number of different T cells clones actually in the body (). Regardless of the precise values of the discrete quantities , they are related to the discrete clone counts through
[TABLE]
where is the number of possible clones that are not expressed in the organism.
Each distinct clone (with ) is characterized by an immigration rate and a per cell replication rate . The immigration rate is clone-specific because it depends on the preferential V(D)J recombination process; the replication rate is also clone-specific due to the different interactions with self-peptides that trigger proliferation. Since both the numbers of theoretically possible () and observed () clones are extremely large, we can define a continuous density from which immigration and proliferation rates and are drawn. This means that the probability that a given clone has an immigration rate between and and replication rate between and is . Since is finite and countable, there are theoretical maximum values and for the immigration and proliferation rates, respectively, such that for and . In the BDI process, the upper bound on the proliferation rate prevents unbounded numbers of naive T cells and is necessary for a self-consistent solution. Conversely, results are rather insensitive to the upper bound on the immigration rate provided decays sufficiently fast such that the mean value of is finite for all . Thus, for simplicity, we henceforth take the limit. The heterogeneity in the immigration and replication rates allows us to go beyond typical “neutral” BDI models, where both rates are fixed to a specific value for all clones.
Finally, we assume the per cell death rate is clone-independent but a function of the total population . This dependence represents the competition among all naive T cells for a common resource (such as cytokines), which effectively imposes a carrying capacity on the population [22, 28, 33]. The specific form of the regulation will not qualitatively affect our findings since we will ultimately be interested in only its value at the steady state population .
Mean-Field Approximation of the BDI Process
The exact steady-state probabilities of configurations of the discrete abundances for a fully stochastic neutral BDI model with regulated death rate were recently derived [10]. To incorporate TCR-dependent immigration and replication rates in a non-neutral model, we must consider distinct values of and for each clone . In this case, an analytic solution for the probability distribution over , even at steady state, cannot be expressed in an explicit form. However, since the effective number of naive T cells ( [32]) is large, we can exploit a mean-field approximation to the non-neutral BDI model and derive expressions for the mean values of the discrete clone counts . We will show later that under realistic parameter regimes, the mean-field approximation is quantitatively accurate. Breakdown of the mean field approximation has been carefully analyzed in other studies [34].
Deterministic approximation for total population and effective death rate:
To implement the mean-field approximation in the presence of a general regulated death rate , we start by writing the deterministic, “mass-action” ODE for the mean density of cells having immigration rate near and proliferation rate near in a BDI process
[TABLE]
Since , the total number of clones that carry an immigration rate between and and a replication rate between and is approximately . The total mean number of naive T cells can then be estimated as a weighted integral over all
[TABLE]
At steady-state, the solution to Eq. 2 can be simply expressed as
[TABLE]
in which is the steady-state value of . Thus, upon averaging Eq. 4 over and , we find
[TABLE]
a self-consistent equation for if the functional form of is known. Without the finite upper bound of the density , the integral in Eq. 5 diverges. The self-consistent determination of depends implicitly on the parameters that define the distribution .
Note that for factorisable , the self-consistent death rate depends only on the combination , where and is the effective death rate at steady-state. We now nondimensionalize all rates with respect to the maximum proliferation rate . In these units, and . Henceforth, we will use the notation and so that the quantities and are meant to be dimensionless, unless otherwise stated. The steady-state self-consistent condition becomes
[TABLE]
where is the dimensionless ratio of the total naive T cell population to the total effective number of clones . We expect . For a given , one can then self-consistently determine in terms of and the parameters defining .
Mean-field model of clone counts:
We now use the results for the mean steady-state population to find the clone counts averaged over all realizations of the underlying stochastic process. The mean-field equations for the dynamics of these mean clone counts in the neutral model were derived in [35, 34] and are reviewed in Appendix 1. In the neutral model, we assume that all effective clones are associated with the same rates and so that the mean field evolution equation for is [35, 34]
[TABLE]
along with the constraint . This equation assumes that both and are uncorrelated, allowing us to write the last term as a product of functions of the mean population and .
At steady state we replace with self-consistently found by solving Eq. 6. The solution follows a negative binomial distribution with parameters and [10, 34]
[TABLE]
The dependence on arises from the determination of through Eq. 6. Although has not yet been averaged over , it also implicitly depends on parameters that define through the solution for from Eq. 6.
Sampling
Unless an animal is sacked and its entire naive T cell population is sequenced, TCR clone distributions are typically measured from sequencing TCRs in a small blood sample. In such samples, low population clones may be missed. In order to compare our predictions with measured clone abundance distributions, we must revise our predictions to allow for random cell sampling.
We define as the fraction of naive T cells in an organism that is drawn in a sample and assume that all naive T cells in the organism have the same probability of being sampled. This is true only if naive T cells carrying different TCRs are not preferentially partitioned into different tissues and are uniformly distributed within an animal. Let us assume that a specific TCR is represented by cells in an organism. If , the probability that cells from the same clone are sampled approximately follows a binomial distribution with parameters and [36, 37, 38, 39, 40]
[TABLE]
The associated mean sampled clone count depends on the clone count predicted in the body via the formula
[TABLE]
where is determined by Eq. 8. As mentioned, the clone counts and , before averaging over , depend implicitly on and other parameters that define through the determination of via Eq. 6. The sum in Eq. 10 can be explicitly performed to yield
[TABLE]
The total expected number of clones predicted in the sample comprised of cells within and can be found from direct summation:
[TABLE]
The effects of sampling are shown in Fig. 3. Subsampling greatly affects the observed clone counts, increasing the exponential decay of [38]. This shifts at large to much smaller values of , while reducing the values of for small . In Fig. 3(a) and (b) we use two very different parameter values and to generate two qualitatively very different . If the subsampling is sufficiently small, the resulting corresponding to qualitatively different can become qualitatively similar. This implies that sampling may make the estimation of whole-body clone counts from sampled data somewhat ill-conditioned, i.e., different whole-body clone counts, upon sampling, may yield similar sampled clone counts.
Although sampling has a dominating effect on the inference of , immigration and proliferation rate distributions may also strongly affect . By averaging the clone counts (Eq. 8), the sampled clone counts (Eq. 11), and the total clone count (Eq. 12) over the distribution , we can make predictions on expected clone counts in the body and in the sample.
Heterogeneity and determination of
The fundamental result given in Eq. 11 applies only to the clone count density (expected clone counts with immigration rate within and proliferation rates within ). Now, we propose forms for in order to evaluate . For simplicity, we assume that and are uncorrelated and that the distribution factorises: .
Proliferation rate heterogeneity:
First, we consider a distribution of TCR sequence-dependent proliferation rates. Since TCR-antigen affinity depends on the receptor amino-acid sequence, the rate of T cell activation and subsequent proliferation is clone-specific [41, 28]. Thus, the specific interactions between TCRs and low-affinity MHC/self-peptide complexes maps to a distribution of proliferation rates among all the possible clones. Since there are no data (known to us) that can be used to infer this mapping or the specific shape of , we assume, for simplicity, a simple uniform “box” distribution centered about a mean value :
[TABLE]
where represents the relative width of the uniform box distribution. The dimensionless self-consistency condition (Eq. 6) thus yields
[TABLE]
where is the mean per-clone immigration rate normalized by the maximum proliferation rate .
Before we use these formulae to understand the experimentally measured clone counts, we explore the effects of proliferation rate heterogeneity on the clone counts. In Fig. 4, we plot the predicted, whole-organism () clone counts for different values of . Since the function has an exponentially decaying term , conditions that lead to larger lead to the slowest decaying .
A fixed normalized value of leads to an exponential decay in of approximately . However, if is integrated over its full width of values , the contribution from generates a longer-tailed distribution . Note that for large , smaller gives rise to smaller , and even a modest is enough to make the decay slowly.
Immigration rate heterogeneity:
Next, we propose a way of evaluating formulating a distribution that represents the TCR sequence-dependent output from the thymus. A specific form for can be obtained from previous studies that predict V(D)J recombination frequencies associated with each TCR sequence. The statistical model for differential V(D)J recombination in humans is implemented in the Optimized Likelihood estimate of immunoGlobulin Amino-acid sequences (OLGA) software [26], which is an updated version of the Inference and Generation of Repertoires (IGoR) software [18]. Below, we estimate by sampling a large number of TCRs from OLGA that draws sequences according to their generation probability. Our working assumption is that thymic selection is uncorrelated with V(D)J recombination so the relative probabilities of forming different TCRs provide an accurate representation of the ratios of the TCRs exported into the periphery.
Both IGoR and OLGA can be used to generate the probabilities corresponding to each drawn sequence but this requires significant computational time and memory. However, since the sequence draws are proportional to the underlying probabilities, we simply drew sequences and exploited the probability-biased selection of sequences by counting the frequencies of each sequence. Out of sequence draws from IGoR or OLGA, there will be distinct sequences with of them occurring times.
For the alpha and beta chains, we found the frequencies , , where is the total number of different integer-valued frequencies observed. The frequencies of the drawn clones are plotted in Figs. 5(a) and (b) in decreasing order. For the alpha chain, we found while for the beta chain we found . The probability that a sequence has integer-valued frequency is . From these numbers, we can effectively average over by taking a sum over discrete values of : with
[TABLE]
where here, is the dimensionless per-clone immigration rate, averaged over the set of unique sequences.
Now that we have defined the possible distributions for and , we can compute the mean, sampled, immigration- and proliferation-averaged clone counts and compare them with measurements. The full formula for the clone counts is thus
[TABLE]
where is given by Eq. 15 and is given by Eq. 14. Eq. 16 is thus our “full model” from which we make predictions of clones count-related quantities and compare them with data. Using this equation, we can mathematically study the importance of the heterogeneities in and by comparing predictions from simpler forms of . In Appendix 2 in the SI, we provide explicit dimensionless formulae for different combinations of and . For example, we can consider fixed or by using or , respectively.
The effective distribution over different values indicates that most drawn clones only appear once since . Thus, we hypothesize that for sufficiently small , our formulae for and all subsequent quantities can be simplified by taking the limit . Indeed, as we show in Appendix 3, such a simpler expression remains highly accurate provided the dimensionless and allows efficient computation. Moreover, this implies that, mathematically, the result arising from weighting over can be approximated by a single effective value , reinforcing our overall conclusion that proliferation rate heterogeneity is the dominant factor. Nonetheless, we will use the full summation over (Eq. 16) in our subsequent analyses.
Results and Analysis
Before performing a quantitative comparison with measured clone counts from Oakes et al. [12], we discuss the qualitative features of our model and typical physiological parameter ranges. While even the basic model parameters are difficult to measure and vary widely across different organisms and individuals, our nondimensionalized model unifies the mechanisms and concepts common to the maintenance of diversity in the T cell repertoire across different organisms.
When considering the data, we observe that even after significant subsampling, there are appreciable clone counts at reasonably large clone sizes , whereas the unsampled clone counts decay exponentially in with rate . Even though may take on a range of values, as determined by , the slowest decay of in arises from the largest possible values of . Thus, a larger proliferation rate heterogeneity will generally yield a longer-tailed , as illustrated in Fig. 4.
Since the data we will analyze are derived from human samples, we will use the following arguments as a guide to roughly estimate the relevant range of parameters:
- •
The average total number of statistically relevant circulating naive T cells is not completely known but estimates are [32]. However, the circulating population in the peripheral blood is approximately two orders of magnitude smaller. These circulating naive T cells nonetheless exchange with those in the much larger population in the lymph and other tissues. The timescale of this exchange (relative to the age of the organism being sampled or the intersample times) will determine the effective relevant for sampling clone counts . Thus, we will use an order-of-magnitude estimate on the lower range of measurements and estimate .
- •
The total possible number of TCRs of either alpha or beta chains may be in the range [42], but the effective number is probably much smaller as many clones may never be generated in a lifetime. Thus, an effective value of may reside at the lower range which leads to .
- •
The average (dimensional) immigration rate per clone can be deduced from the total thymic output of all clones , which has been estimated across a wide range of values /day [43, 44, 45, 46, 47]. If we use an effective repertoire size of , the average per clone immigration rate becomes .
- •
The mean proliferation rate is difficult to measure but has been estimated to be on the order of [47]. If we nondimensionalize using , the dimensionless .
- •
The sampling fraction , although in principle determined experimentally, is also hard to quantify due to the uncertainty in . Blood sampling volume fractions from humans are typically ; however, in recent experiments [12] the number of enumerated sequences , which, given rough estimates of effective , yield . Due to this uncertainty in , we will explore different fixed values of around .
Using the above guide for reasonable parameter ranges, we now consider fitting our results in Eqs. 16, S9, S10, S11, S12, S13, and S14 to some of the available data [12]. Before doing so, note that although the log-log plots shown in Figs. 1(a,b) provide a simple visual for or , fitting must be performed on the linear scale.
The measured data includes data at values of for which no clones were detected so that . On the log scale these data points include which nonetheless should be included in the fitting as they represent realizations of the system. Numerical fitting on the log-log scale would be misleading once a value of is encountered. Thus, will fit our mean-field model on the linear scale to the fraction of the total number of sampled cells that are in clones of size :
[TABLE]
where the denominator comes directly from the definition , the sampling relation , and Eq. 6. Rather than using directly from the number of reads in an experimental sample, equivalently, we use the model expression to arrive at the last equality in Eq. 17. This form ensures strict normalization and is independent of the unknown repertoire size ( is proportional to ). Note that for , the implicit factor of in (Eq. 11) can cells the explicit in the denominator of Eq. 17. Thus, as well as depend on the effective value of only through the determination of through in Eq. 6.
Our mathematical framework provides only mean sampled clone counts while each sample of the data represents one realization. Large sample-to-sample variations in the clone counts would render the fitting less informative, but these large variations were not seen in the triplicate samples in Oakes et al. [12]. Mechanistically, we expect that for large the number of cells contributing to is also large so demographic stochasticity is relatively small and results in small uncertainties in the value of , and not in the magnitude of . Large clones are also likely include memory T cells that have been produced after antigen stimulation of specific clones. Memory T cells are difficult to accurately distinguish from naive T cells [12] but we will see that large components of negligibly influence the fitting.
For small , the numbers of clones are large so we expect sampling and intrinsic stochasticity to also be controlled. Thus, we averaged the clone counts from three different blood samples from each patient in Oakes et al. [12] and only compared them to the expected value of the sampled clone counts in our model.
By comparing our model with the data via the total least-squares error
[TABLE]
we explore how depends on the parameters , , , and sampling fraction .
Since the variation was not large and there were already too many uncertainties (particularly the sampling fraction , the shape of , and the accuracy of a steady-state approximation), we did not pursue a much more technical uncertainty analysis. Our goal was to simply consider the data using a reasonable initial model that captures the known general features of naive T cell maintenance (immigration, birth, death) and find constraints among the parameters , and .
In Figs. 6(a-c) the data were derived from the average of three measurements from beta chain CD4 sequences sampled from one patient [12]. Using this data, we compute and plot the error as a function of for various values of using the neutral model (, Eq. S9 in Appendix 2). For reasonable values of and sampling fraction , we see that the predicted is typically or larger. In Figs. 6(d-f) we use the full-width () distribution of to show the analogous error for the same data using the same sampling fractions . Note that the optimal (low error) values of are significantly smaller that those in found using (Figs. 6(a-c)) and that the results are rather insensitive to sampling fraction . These smaller values of are more consistent with known physiological understanding. Thus, the distributed proliferation rate model provides a much more self-consistent fit to the data than the fixed proliferation rate neutral model.
Note that the value of the errors along the minimum valley are nearly constant, only slightly decreasing as . This means that the model and associated data constrains to , but cannot independently determine them. The constraint arises as a linear relationship as shown in Fig. 7 that shows the relationship between and along the minimizer of .
Figs. 7 show the optimal as a function of for (a) the neutral model and (b) the full-width model. Although the variation in error is negligible across for each model (color variation along the lines), the full-width distributed model () generates a larger error than the neutral model, but it provides much more reasonable values of and is much less sensitive to . To investigate intermediate values of to determine if a low error can be achieved at small values of , we plot in Fig. 7(c) the optimal as a function of proliferation rate heterogeneity . Note that even a small heterogeneity significantly reduces the predicted . However, if or say, the required can become quite large.
To explore the dependence of the error on the proliferation rate heterogeneity, we fix , and , and plot the error as a function of .
Fig. 8 shows how the minimizing value of is sensitive to , suggesting larger proliferation heterogeneity as is decreased. The minimum of the error, however, is rather insensitive to . For sufficiently small , the minimum error at is higher than those with minimum . Since the minima in the interior are nearly invariant to , we indeed show that near-optimal solutions with can be found when the proliferation rate heterogeneity is appreciable.
Using the parameters associated with the minima in Fig. 8, we plot our predicted against the data in Fig. 9.
Although the neutral model appears to fit better, especially for small , the necessary values of and are too large and small, respectively. Conversely, when proliferation rate heterogeneity is allowed, the best-fits have small error and are found using . Note that most of the information in the data lies in how decreases over the first few values of . The goodness of fit of our model to the data depends mostly on the predicted initial decreases in .
Discussion
Here, we review and justify a number of critical biological assumptions and mathematical approximations used in our analysis. The effects of relaxing our approximations are also discussed.
Factorisation of .
For mathematical tractability, we have assumed . Given the typical physiological values of , the clone count formulae derived from our model can be accurately approximated by a single value of . Thus, we expect that the immigration rate distribution can be approximated by . This allows further approximation of our formulae as shown in Appendix 3. In Appendix 4, we explicitly show that factorisation is an accurate approximation.
We have also assumed that selection is uncorrelated with the generation probabilities of the TCR nucleotide sequences encoded in IGoR/OLGA. The assumption is that the recombination statistics are uncorrelated with the statistics of thymic selection, a process that is based on TCR amino acid sequences. However, we note that it has been suggested that selection pressure may induce a correlation between TCRs generated and selected [48]. The corresponding statistics of the frequencies of selected TCRs would be modified from those of the generated TCRs shown in Figs. 5. Nonetheless, we assume that the resulting distribution can still be approximated by a single- model which will not qualitatively alter our conclusions.
Mean-field approximation.
Our mean-field approximation for the mean clone count is embodied in Eq. 7, where correlations between fluctuations in the total population in the regulation term and the explicit terms are neglected. This approximation has been shown to be accurate for when [34]. The mean-field results overestimate the clone counts for . Moreover, when the total steady-state T cell immigration rate is extremely small, the effects of competitive exclusion dominate and a single large clone arises due to competitive exclusion [49, 50, 34]. Nonetheless, an accurate approximation for the steady-state clone abundance can be obtained using a variation of the two-species Moran model as shown in [34].
For the naive T cell system, because is so large, the mean immigration rate is such that competitive exclusion is not a dominant feature. Moreover, since , clones counts at comparable sizes are not observed and predicted to be negligible in all models.
The values of become exponentially smaller for large , our inference is most sensitive to the values of for small to modest . The information in the data is primarily manifested by how the decays in , we before the mean-field approximation deviates from the exact solution. Thus, the parameters associated with the human adaptive immune system satisfy the conditions for the mean-field approximation to be accurate, justifying its use in the BDI model.
Steady state assumption.
In this study, we only considered the steady state of our birth-death-immigration model in Eq. 8 because this limit allowed relatively easy derivations of analytical results. This was also the strategy for previous modeling work [35, 4, 6, 7, 34]. However, the per-clone immigration and proliferation times may be on the order of months or years, a time scale over which thymic output diminishes as an individual ages [47, 51, 52, 33]. Indeed, clone abundance distributions have been shown to show specific patterns as a function of age [53, 54, 55].
Although , with fixed and relaxes to steady-state quickly, on a timescale of months, the different subpopulations of specific sizes described by their number relax to quasi-steady steady-state across a spectrum of time scales depending on the clone sizes [56, 34]. The timescales of relaxation of the largest clones can be estimated through the eigenvalues of linearized the Eqs. 7 to be years [33].
In addition to time-dependent changes in , more subtle time-inhomogeneities such as changes in proliferation and death rates have been demonstrated [51, 52]. Thus, our steady-state assumption should be relaxed by incorporation of time-dependent perturbations to the model parameters and/or . Longitudinal measurements of clone abundances or experiments involving time-dependent perturbations would provide significant insight into the overall dynamics of clone abundances. Finally, notice that the error slowly decreases as , especially for small . The requirement that requires a best fit that is extremely small. At such small , the time for the system to reach steady state scales as and the validity of the steady-state approximation is nullified.
Thus, given the multiple known underlying time-inhomogeneous processes that are known to exist, the steady-state may not be strictly reached and our analysis should be considered as an approximation.
Clustered Immigration.
Our mean field model assumed that each immigration event introduced a single naive T cell in the immune system. However, T cells can divide before leaving the thymus and reach a homeostatic state in the periphery. This process can be described by the simultaneous immigration of more than one naive T cell with the same TCR. Clustered immigration of cells can be implemented in the core model for (Eq. 7) via an immigration term of the form , where for (see Appendix 5). For , an informative analytic expression for is not available. In Fig. S2 in the Appendix, we show the predicted clone abundance for a neutral model in which . When compared to the case where there is only one cell per immigration, the clone abundance will have a larger slope for , making it kink more downward near . Thus, from Figs. S2 and 9(a), we can see that paired immigration () would increase for , providing an improved fitting to data over single copy immigration ().
Thus, in addition to appreciable sensitivity of the predicted clone counts to , we also expect clustered immigration defined through the immigration rates , to control the goodness of fit to data. Indeed, Fig. S2 suggests that the distribution of immigration cluster sizes , in addition to the proliferation rate heterogeneity , is an important determinant of measured clone counts and that may be constrained by data. We leave this for future investigation.
General conclusions.
We developed a heterogeneous multispecies birth-death-immigration model and analyzed it in the context of T cell clonal heterogeneity; the clone abundance distribution is derived in the mean-field limit. Unlike previous studies [4], our modeling approach incorporated sampling statistics and provided simple formulae, allowing us to predict clone abundances under different rate distributions for arbitrarily large systems (), without the need for simulation. The properties of the BDI model and the overall shape of the sampled clone count data renders the first few -values of or the most important for determining the constraints among the model parameters. In other words, only the initial rate of the decrease in for small governs the quality of fitting to the model, and one should not expect to be able to explicitly infer more than one or two free parameters.
Our heterogeneous BDI model produced mean sampled clone count distributions that we could directly compare with measured clone counts. The unsampled clone counts of the neutral model (homogeneous and ) follow a negative binomial distribution which is further modified upon sampling and distribution over the heterogeneous immigration and proliferation rates. Although we determined through a tool that used recombination statistics inferred from cDNA and gDNA sequences [18, 26], we found that the behavior of the model is rather insensitive to distributions with mean values much smaller than the largest proliferation rates . The model results are dominated by many low immigration-rate clones and a model that replaces with its mean value is sufficient.
Conversely, we find that the shape of the clone count profiles are quite sensitive to the proliferation rate heterogeneity . A small amount of heterogeneity quickly reduces the best-fit values of to reasonable values. For estimated values , , and small values of , requires a best-fit width . Heterogeneity is needed to generate clones of sufficiently large size that persist after sampling. Although the number of TCR clones with large proliferation rates may be small, such clones proliferate more rapidly contributing to higher clone counts at larger sizes. In particular, we found that the shape of expected clone abundance is sensitive to the behavior of the proliferation rate distribution near the maximum dimensional proliferation rate , .
The predicted clone counts are also modestly sensitive to the distribution of immigration cluster sizes (representing transient proliferation just before thymic output). When cells of a clone are simultaneously exported by the thymus, the predicted mean clone counts decay much more slowly for small (see Fig. S2). This modification will allow for better fitting since clustered immigration increases the predicted clone counts for larger , , etc., and eventually , etc. Thus, we expect that a model containing multiple clustered immigration rates will lower the error and provide better fitting, particularly at larger . Additional analysis using a distribution of immigration cluster sizes may allow this type of clone count data to reveal more information about the physiological mechanism of naive T cell maintenance.
Even assuming modest heterogeneity, our work lead to the conclusion that proliferation heterogeneity is the more important mechanism driving the emergence of the sampled clone count distributions (and ) [12]. These results are consistent with the finding that naive T cells in humans are maintained by proliferation rather than thymic output [9]. Since we have only investigated the effects of a uniform distribution for , further studies using more complex shapes of can be easily explored numerically using our modeling framework. Different parameter values and rate distributions appropriate for mice, in which naive T cells are maintained by thymic output, should also be explored within our modeling framework.
Acknowledgements
This work was supported by grants from the NIH through grant R01HL146552 and the NSF through grants DMS-1814364 (TC) and DMS-1814090 (MD). The authors also thank the Collaboratory in Institute for Quantitative and Computational Biosciences at UCLA for support to RD.
How heterogeneous thymic output and
homeostatic proliferation shape naive T cell receptor clone abundance distributions
S1: Mathematical Appendices
1 Neutral model
Here, we review the neutral model to provide insight into the properties of our heterogeneous BDI model. When there is no heterogeneity in either proliferation or immigration rates, . Upon inserting this expression for in Eq. 8, we find that the clone abundance follows a negative binomial distribution [10]:
[TABLE]
We can also express , the clone abundance distribution normalized by the mean richness in the body as
[TABLE]
which is a negative binomial distribution of parameters and . Using , , and , we find . In this regime, , for , can be approximated by a log-series distribution with parameter .
To mathematically show that converges to a log-series distribution when , consider a random variable that follows a negative binomial distribution of parameters and
[TABLE]
Note that the probability mass function of is given by as can be seen from Eq. S1, the clone abundance distribution for all possible clones, which includes , the number of all clones that are not represented in the organism. To find the clone abundance distribution , for all the clones present in the organism, we must exclude the case by marginalizing the distribution of over all :
[TABLE]
What remains is to show that the distribution of converges to a log-series distribution of parameter when . Consider the moment generating function of given by
[TABLE]
Since the moment generating function of a negative binomial distribution is known, and since (see Eq. S3), we can write
[TABLE]
For any , the limit of , yields
[TABLE]
If we apply this result to Eq. S6 for , we find
[TABLE]
which we recognize as the moment generating function of a log series distribution of parameter . Thus, we finally have
[TABLE]
2 Explicit forms using different ,
In the following, we propose four simplifying expressions for the heterogeneity-averaged clone counts derived from Eq. 16.
Clone-independent Neutral model:
First, consider the simplest case where all naive T cells carry the same immigration and proliferation rates and , respectively, and define . This case corresponds to and . The self-consistent condition for and become
[TABLE]
and the clone count can be explicitly simplified to
[TABLE]
The total sampled clone count is then
[TABLE]
Fixed , distributed :
Next, consider a common immigration rate for all T cell clones and a box distribution of full width . The integral over the dimensionless proliferation rate is now over and . The averaged clone counts are now explicitly
[TABLE]
The total sampled clone count can also be explicitly expressed the integral over from Eq. 12:
[TABLE]
Clone-specific immigration, fixed :
Finally, using the same rate dimensionalization as before (Eqs. S8), we find explicitly
[TABLE]
where here, the dependence also implicitly arises in through Eq. 15. As in Eq. S9, the factor of arises from our choice of nondimensionalization using , twice the naive T cell proliferation rate. Similarly, the total sampled clone count can be explicitly expressed as
[TABLE]
Note that here, is interpreted as the mean immigration rate over the different clones.
3 Small approximation
We show that if the support of is sufficiently small, terms of order can be neglected in the equation for . While is summed or integrated over, for reasonable distributions , the lowest few rates contribute the most and the average over can be replaced by a single effective value . Even though is integrated over , and the region near would make large, the contribution from the is also small near . We have numerically checked that for all cases of , we can approximate by
[TABLE]
This form is derived by approximating the product term as and the exponential term . Thus, can also be approximated by
[TABLE]
where and is given by
[TABLE]
Note that the prefactor in Eq. S16 derived from or . Since the rest of the integrand is independent of , the irrelevance of the shape of is apparent. Only the mean value arises in this approximate calculation of .
We have explicitly shown that for small , this approximation is quantitatively accurate. This simpler form speeds up our numerical analysis and fitting to data. The parameters to infer thus . Even though is uncertain, it is in principle controlled by the experiment.
Although we evaluate the error between the model and date as a function of the parameters , we have not implemented priors for these parameters. We simply investigated the structure of the error and considered physiologically realistic regions in which .
4 Correlated immigration and proliferation .
Hitherto, we have considered independent immigration and proliferation, and assumed a factorisable rate distribution . However, immigration and proliferation rates may be correlated for certain clones. For example, a frequent realization of V(D)J recombination may also result in a TCR that is more likely to be activated for proliferation. In this case, would be positively correlated with . In Fig. S1 we consider the effect of correlated . For , we considered normalized, positively/negatively correlated box distributions as shown in Fig S1(a):
[TABLE]
Within our mean field model, these correlated distributions result in very similar expected clone abundance distributions (Fig S1(b)). This insensitivity to correlations between immigration and proliferation can be qualitatively understood by considering the “line integral” over dominant paths of in the vs. diagram. As shown in Fig. S1(c), both line integrals remain in the log-series distribution regime, indicating that the clone abundance distributions are qualitatively similar to that predicted by a model with proliferation heterogeneity alone.
5 for clustered immigration
We explore how clustered emigration from the thymus affects the mean clone count . Suppose that cells of the same clone (TCR nucleotide or amino acid sequence) are simultaneously exported by the thymus. The equation for the mean clone count is thus
[TABLE]
This equation does not admit a simple analytic solution so we numerically solved the equation assuming and . The shape of for single cell immigration () is compared to that of .
For , , and ultimately and is flatter up to , making the clone counts kink more downwards near . Thus, as can be seen from Fig. 9(a,b), we can reasonably conclude that some level of paired immigration would provide even better fits to the data at appropriately small values of , especially for the first few -points.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 11. Qi Q, Liu Y, Cheng Y, Glanville J, Zhang D, Lee JY, et al. Diversity and clonal selection in the human T-cell repertoire. Proceedings of the National Academy of Sciences (USA). 2014;111:13139–13144.
- 22. van den Broek T, Borghans JAM, van Wijk F. The full spectrum of human naive T cells. Nature Reviews Immunology. 2018;18:363–373.
- 33. Laydon DJ, Bangham CRM, Asquith B. Estimating T-Cell Repertoire Diversity: Limitations of Classical Estimators and a New Approach. Phil Trans R Soc B. 2015;370(1675):20140291. doi:10.1098/rstb.2014.0291.
- 44. Desponds J, Mora T, Walczak AM. Fluctuating Fitness Shapes the Clone-Size Distribution of Immune Repertoires. Proceedings of the National Academy of Sciences, USA. 2016;113(2):274–279. doi:10.1073/pnas.1512977112.
- 55. Desponds J, Mayer A, Mora T, Walczak AM. Population Dynamics of Immune Repertoires. ar Xiv preprint ar Xiv:170300226. 2017;.
- 66. Lythe G, Callard RE, Hoare RL, Molina-París C. How Many TCR Clonotypes Does a Body Maintain? Journal of Theoretical Biology. 2016;389:214–224.
- 77. de Greef PC, Oakes T, Gerritsen B, Ismail M, Heather JM, Hermsen R, et al. The naive T-cell receptor repertoire has an extremely broad distribution of clone sizes. bio Rxiv. 2019; p. 691501. doi:10.1101/691501.
- 88. Koch H, Starenki D, Cooper SJ, Myers RM, Li Q. power TCR: A Model-Based Approach to Comparative Analysis of the Clone Size Distribution of the T Cell Receptor Repertoire. PLOS Computational Biology. 2018;14(11):e 1006571. doi:10.1371/journal.pcbi.1006571.
