# TCR2HLA: Calibrated inference of HLA genotypes from TCR repertoires enables identification of immunologically relevant metaclonotypes

**Authors:** Koshlan Mayer-Blackwell, Anastasia Minervina, Mikhail Pogorelyy, Puneet Rawat, Melanie R. Shapiro, Leeana D. Peters, Emily S. Ford, Amanda L. Posgai, Kasi Vegesana, Samuel Minot, David M. Koelle, Victor Greiff, Philip Bradley, Todd M. Brusko, Paul G. Thomas, Andrew Fiore-Gartland

PMC · DOI: 10.1371/journal.pcbi.1013767 · 2026-01-16

## TL;DR

TCR2HLA is a tool that predicts HLA genotypes from TCR repertoires, enabling discovery of TCRs linked to specific HLA alleles and infections like SARS-CoV-2.

## Contribution

Introduces TCR2HLA, a calibrated framework for inferring HLA genotypes from TCR repertoires and identifying immunologically relevant TCRs.

## Key findings

- TCR2HLA achieved high accuracy in predicting HLA alleles from TCRβ sequences across multiple datasets.
- Identified ~96,000 TCRβ features strongly associated with specific HLA alleles from 71M input TCRs.
- Enabled discovery of SARS-CoV-2 related TCRs in a dataset lacking HLA data.

## Abstract

T cell receptors (TCRs) recognize peptides presented by polymorphic human leukocyte antigen (HLA) molecules, but HLA genotype data are often missing from TCR repertoire sequencing studies. To address this, we developed TCR2HLA, an open-source tool that infers HLA genotypes from TCRβ repertoires. Expanding on work linking public TRBV-CDR3 sequences to HLA genotypes, we incorporated “quasi-public” metaclonotypes – composed of rarer TCRβ sequences with shared amino acid features – enriched by HLA genotypes. Using four TCRβseq datasets from 3,150 individuals, we applied TRBV gene partitioning and locality-sensitive hashing to identify ~96,000 TCRβ features strongly associated with specific HLA alleles from 71M input TCRs. Binary HLA classifiers built with these features achieved high balanced accuracy (>0.9) across common HLA-A (9/12), B (9/12), C (6/13), DRB1 (11/11) alleles and prevalent DPA1/DPB1 (6/10), DQA1/DQB1 (8/17) heterodimers. We also introduced a high-sensitivity calibration to support predictions in samples with as few as 5,000 unique clonotypes. Calibrated predictions with confidence filtering improved reliability. Beyond genotype imputation, TCR2HLA enables the discovery of novel HLA- and exposure-associated TCRs, as shown by the identification of SARS-CoV-2 related TCRs in a large COVID-19 dataset lacking HLA data. TCR2HLA provides a scalable framework for bridging the gap between TCRseq data and HLA genotype for biomarker discovery.

T cells are a crucial component of the immune system. Each T cell has a unique T cell receptor (TCR) that can detect a potential infection by recognizing protein fragments presented on the surface of cells, bound by human leukocyte antigen (HLA) proteins. The HLA proteins are highly variable across individuals and determine what part of a pathogen each person’s immune system can recognize. Thus, HLA genotype and pathogen exposure shape the composition of T cell memory over an individual’s lifetime and are important for interpreting their T cell responses and TCR repertoire. However, HLA genotype data are often absent from TCR sequencing studies. To help, we developed TCR2HLA, an open-source tool that infers HLA genotype from the thousands of TCR sequences recoverable in blood samples. Statistical inference of HLA genotype probability enables grouping of data from participants with shared HLA gene variants, unlocking approaches to identify TCRs that could be biomarkers of infection, autoimmunity, or vaccine response.

## Linked entities

- **Genes:** HLA-A (major histocompatibility complex, class I, A) [NCBI Gene 3105], HLA-B (major histocompatibility complex, class I, B) [NCBI Gene 3106], HLA-C (major histocompatibility complex, class I, C) [NCBI Gene 3107], HLA-DRB1 (major histocompatibility complex, class II, DR beta 1) [NCBI Gene 3123], HLA-DPA1 (major histocompatibility complex, class II, DP alpha 1) [NCBI Gene 3113], HLA-DPB1 (major histocompatibility complex, class II, DP beta 1) [NCBI Gene 3115], HLA-DQA1 (major histocompatibility complex, class II, DQ alpha 1) [NCBI Gene 3117], BOLA-DQB1 (MHC class II antigen) [NCBI Gene 539241]
- **Proteins:** Tcr (Third chromosome alpha methyl dopa-resistant)
- **Diseases:** SARS-CoV-2 (MONDO:0100096)

## Full-text entities

- **Genes:** HLA-DQB1 (major histocompatibility complex, class II, DQ beta 1) [NCBI Gene 3119] {aka CELIAC1, HLA-DQB, IDDM1}, HLA-DPB1 (major histocompatibility complex, class II, DP beta 1) [NCBI Gene 3115] {aka DPB1, HLA-DP, HLA-DP1B, HLA-DPB}, HLA-A (major histocompatibility complex, class I, A) [NCBI Gene 3105] {aka HLAA}, HLA-DRB1 (major histocompatibility complex, class II, DR beta 1) [NCBI Gene 3123] {aka DRB1, HLA-DR1B, HLA-DRB, SS1}, HLA-DQA1 (major histocompatibility complex, class II, DQ alpha 1) [NCBI Gene 3117] {aka CELIAC1, DQ-A1, DQA1, HLA-DQA, HLA-DQA1*}, TRBV20OR9-2 (T cell receptor beta variable 20/OR9-2 (non-functional)) [NCBI Gene 6962] {aka CDR3, TCRBV20S2, TCRBV2O, TCRBV2S2O}
- **Diseases:** COVID-19 (MESH:D000086382)
- **Species:** Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049]

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12810895/full.md

---
Source: https://tomesphere.com/paper/PMC12810895