# High-quality mouse reference genomes reveal the structural complexity of the murine protein-coding landscape

**Authors:** Mohab Helmy, Jin U. Li, Xinyu F. Yan, Rachel K. Meade, Elizabeth Anderson, Patrick B. Chen, Anne M. Czechanski, Tomás Di Domenico, Jonathan Flint, Erik Garrison, Marco T.P. Gontijo, Andrea Guarracino, Leanne Haggerty, Edith Heard, Kerstin Howe, Narendra Meena, Fergal J. Martin, Eric A. Miska, Isabell Rall, Navin B. Ramakrishna, Alexandra Sapetschnig, Swati Sinha, Diandian Sun, Francesca F. Tricomi, Runjia Qu, Jonathan M.D. Wood, Tianzhen Wu, Dian J. Zhou, Laura Reinholdt, David J. Adams, Clare M. Smith, Jingtao Lilue, Thomas M. Keane

PMC · DOI: 10.1016/j.xgen.2025.101074 · 2025-12-01

## TL;DR

Researchers created high-quality mouse genomes to better understand complex immune-related regions and improve genetic studies.

## Contribution

The study provides new high-quality mouse genomes that resolve complex genomic regions and improve RNA-seq analysis.

## Key findings

- Resolved complex genomic regions like MHC, defensin cluster, and T cell receptor.
- Identified over 400 genes with VNTR polymorphisms in coding regions.
- Improved RNA-seq read mapping and gene expression analysis using strain-specific genomes.

## Abstract

We present a collection of 17 high-quality long-read inbred mouse strain genomes with complete annotation (contig N50s of 0.8–33.9 Mbp). This collection includes 12 widely used classical laboratory strains and 5 wild-derived strains. We have resolved previously incomplete genomic regions, including the major histocompatibility complex (MHC), defensin cluster, T cell receptor, and Ly49 complexes. Hundreds of non-reference genes from previous publications not found in GRCm39, such as Defa1, Raet1a, and Klra20 (Ly49T), were localized in the new reference genomes. We conducted a genome-wide scan of variable number tandem repeats (VNTRs) within the coding regions, identifying over 400 genes with VNTR polymorphisms with up to 600 repeat copies and repeat units reaching 990 nucleotides. Our strain-specific annotations enhance RNA sequencing (RNA-seq) analyses, as demonstrated in PWK/PhJ, where we observed a 5.1% improvement in read mapping and expression-level differences in 2.1% of coding genes compared to using GRCm39.

•Collection of high-quality mouse reference genomes•Insights into the structural complexity of key regions of the mouse genome•Impact of using a strain-specific genome for RNA-seq analysis

Collection of high-quality mouse reference genomes

Insights into the structural complexity of key regions of the mouse genome

Impact of using a strain-specific genome for RNA-seq analysis

Helmy et al. provide a collection of high-quality mouse reference genomes. They were able to resolve some of the most complex regions among mouse genomes that are involved in immune defense. These findings have an impact on a variety of mouse genetics experiments as well as the usage of mice as an animal model for medical research.

## Linked entities

- **Genes:** DEFA1 (defensin alpha 1) [NCBI Gene 1667], Raet1a (retinoic acid early transcript 1, alpha) [NCBI Gene 19368], Klra20 (killer cell lectin-like receptor subfamily A, member 20) [NCBI Gene 93967], Klra20 (killer cell lectin-like receptor subfamily A, member 20) [NCBI Gene 93967]
- **Species:** Mus musculus (taxon 10090)

## Full-text entities

- **Genes:** Raet1a (retinoic acid early transcript 1, alpha) [NCBI Gene 19368] {aka RAE-1alpha, Rae1a, Rae1alpha, Raet1}, Defa1 (defensin, alpha 1) [NCBI Gene 13216] {aka Defcr, Defcr1}, Klra20 (killer cell lectin-like receptor subfamily A, member 20) [NCBI Gene 93967] {aka Ly49t}, Klra (killer cell lectin-like receptor, subfamily A) [NCBI Gene 17055] {aka Ly-49, Ly49}
- **Species:** Mus musculus (house mouse, species) [taxon 10090]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12903361/full.md

---
Source: https://tomesphere.com/paper/PMC12903361