# NGSMHC: a simple bioinformatics tool for comprehensively typing major histocompatibility complex genes in non-human species using next-generation sequencing data

**Authors:** Mingue Kang, Byeongyong Ahn, Jae Yeol Shin, Jongan Lee, Eun Seok Cho, Chankyu Park

PMC · DOI: 10.5713/ab.25.0468 · Animal Bioscience · 2025-09-30

## TL;DR

NGSMHC is a new bioinformatics tool that accurately identifies MHC genes in non-human species using next-generation sequencing data, with better results from long-read sequencing.

## Contribution

NGSMHC introduces a streamlined method for MHC genotyping in non-human species using NGS data, with improved accuracy via long-read sequencing.

## Key findings

- NGSMHC achieved high concordance rates with PCR-SBT for most SLA genes using short-read data.
- Long-read sequencing significantly improved NGSMHC's accuracy in identifying complex SLA genotypes.
- SLA-2 typing had lower concordance due to sequence similarity and polymorphism complexity.

## Abstract

Understanding the individual- and population-level polymorphisms of major histocompatibility complex (MHC) genes is crucial for identifying associations between MHC variations and immune phenotypes. To support this, we developed NGSMHC, a streamlined bioinformatics tool for efficient and accurate MHC genotyping using next-generation sequencing (NGS) data in non-human species.

NGSMHC constructs phased haplotype contigs of selected MHC genes from BAM-format mapping data and determines the best matching MHC alleles and genotypes via nucleotide BLAST analysis against a user-provided reference set of MHC alleles. We evaluated NGSMHC using short-read whole-genome sequencing (WGS) data from 12 pigs, focusing on swine leukocyte antigen (SLA) genes. The typing results from NGSMHC were compared to those obtained using polymerase chain reaction sequence-based typing (PCR-SBT). In addition, we tested NGSMHC on a publicly available long-read WGS dataset with known SLA genotypes.

The short-read WGS data showed an average read depth of 20.9× across the SLA region, enabling typing of SLA-2, SLA-3, SLA-DRB1, and SLA-DQB1 using NGSMHC. The concordance rates between NGSMHC and PCR-SBT were 88% for SLA-3, 92% for SLA-DRB1, and 100% for SLA-DQB1. However, SLA-2 typing showed lower concordance (58%), likely due to its high sequence similarity with other SLA class I genes and complex intra-locus polymorphisms. In contrast, NGSMHC accurately identified all tested SLA genotypes—including SLA-1, SLA-2, SLA-3, SLA-DRA, SLA-DRB1, SLA-DQA, and SLA-DQB1—when applied to the long-read WGS data.

NGSMHC is a simple and effective tool for MHC genotyping using NGS data, particularly for non-human species. Its accuracy is significantly improved by long-read sequencing, underscoring the importance of read length in precise MHC allele determination.

## Linked entities

- **Genes:** SLA2 (Src like adaptor 2) [NCBI Gene 84174], SLA-3 (MHC class I antigen 3) [NCBI Gene 100037288], SLA-DRB1 (MHC class II histocompatibility antigen SLA-DRB1) [NCBI Gene 100153386], SLA-DQB1 (SLA-DQ beta1 domain) [NCBI Gene 100037921], SLA (Src like adaptor) [NCBI Gene 6503], SLA-DRA (MHC class II DR-alpha) [NCBI Gene 100135040], SLA-DQA1 (MHC class II histocompatibility antigen SLA-DQA) [NCBI Gene 100153387]
- **Species:** Mus musculus (taxon 10090)

## Full-text entities

- **Genes:** SLA2 (Src like adaptor 2) [NCBI Gene 84174] {aka C20orf156, MARS, SLAP-2, SLAP2}, HLA-C (major histocompatibility complex, class I, C) [NCBI Gene 3107] {aka D6S204, HLA-JY3, HLAC, HLC-C, MHC, PSORS1}, SLA (Src like adaptor) [NCBI Gene 6503] {aka SLA1, SLAP}
- **Species:** Homo sapiens (human, species) [taxon 9606], Sus scrofa (pig, species) [taxon 9823]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12877382/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12877382/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/PMC12877382/full.md

---
Source: https://tomesphere.com/paper/PMC12877382