# A pan-disease and population-level single-cell TCRαβ repertoire reference

**Authors:** Ziwei Xue, Lize Wu, Bing Gao, Ruonan Tian, Yiru Chen, Yicheng Qi, Tianze Dong, Yadan Bai, Yu Zhao, Bing He, Lie Wang, Zuozhu Liu, Jianhua Yao, Linrong Lu, Wanlu Liu

PMC · DOI: 10.1038/s41421-025-00836-7 · 2025-10-14

## TL;DR

This study creates a large reference of T cell receptor data from diverse diseases and populations, revealing shared TCR patterns linked to common viruses and HLA alleles.

## Contribution

The study introduces a large-scale single-cell TCR reference and a computational framework, TCR-DeepInsight, to analyze disease-associated TCR clusters.

## Key findings

- Public TCRαβs are widely shared across populations and linked to common HLA alleles and viral epitopes.
- Germline-encoded TCR-MHC restriction features are revealed in CD4+/CD8+ T cell lineages.
- TCR-DeepInsight identifies HLA-shared and disease-associated TCR clusters with similar sequence and gene expression profiles.

## Abstract

Recent advances in single-cell technology enable the simultaneous capture of T cell receptor (TCR) sequences and gene expression (GEX), providing an integrated view of T cell function. However, linking TCRαβ information and T cell phenotypes at the population level to elucidate their disease association remains an unaddressed gap. Here, by constructing a large-scale reference of paired single-cell RNA/TCR sequencing (scRNA/TCR-seq) comprising more than 2 million T cells from 70 studies, 1017 biological samples, 583 individuals, and 46 disease conditions, along with their single-cell transcriptome, full-length paired TCR, and human leukocyte antigen (HLA) genotypes, we revealed the intrinsic features of germline-encoded TCR-major histocompatibility complex (MHC) restriction in CD4+/CD8+ lineages. We also observed widely existing public TCRαβs across the population, associated with higher clonal expansion levels and shared HLA alleles. The most publicly shared TCRs are likely to target epitopes from common viruses, such as Epstein-Barr virus (EBV), cytomegalovirus (CMV), and influenza A virus (IAV). Furthermore, we introduced TCR-DeepInsight, a computational framework to identify HLA-shared and disease-associated TCRαβ clusters that exhibit similar TCR sequence and GEX profiles, extensible for researchers to incorporate their data with our reference and characterize potentially functional TCRs. In summary, our work presents a panoramic scTCRαβ reference and computational methods for TCR study.

## Full-text entities

- **Genes:** HLA-C (major histocompatibility complex, class I, C) [NCBI Gene 3107] {aka D6S204, HLA-JY3, HLAC, HLC-C, MHC, PSORS1}, TRBV20OR9-2 (T cell receptor beta variable 20/OR9-2 (non-functional)) [NCBI Gene 6962] {aka CDR3, TCRBV20S2, TCRBV2O, TCRBV2S2O}, CD4 (CD4 molecule) [NCBI Gene 920] {aka CD4mut, IMD79, Leu-3, OKT4D, T4}, HLA-A (major histocompatibility complex, class I, A) [NCBI Gene 3105] {aka HLAA}, CD8A (CD8 subunit alpha) [NCBI Gene 925] {aka CD8, CD8alpha, IMD116, Leu2, p32}
- **Species:** human gammaherpesvirus 4 (Epstein Barr virus, no rank) [taxon 10376], Influenza A virus (no rank) [taxon 11320], Cytomegalovirus (genus) [taxon 10358]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12521495/full.md

---
Source: https://tomesphere.com/paper/PMC12521495