# Assembly of a pangenome uncovers novel non-reference unique insertion sequences in cattle highlighting their genetic diversity

**Authors:** Valentin Sorin, Florian Besnard, Aurélien Capitan, Cécile Grohs, Maulana Mughitz Naji, Clémentine Escouflaire, Sébastien Fritz, Joanna Lledo, Camille Eché, Carole Iampietro, Cécile Donnadieu, Denis Milan, Laurence Drouilhet, Gwenola Tosser-Klopp, Didier Boichard, Christophe Klopp, Marie-Pierre Sanchez, Mekki Boussaha

PMC · DOI: 10.1186/s40104-026-01373-3 · Journal of Animal Science and Biotechnology · 2026-03-09

## TL;DR

A cattle pangenome reveals new genetic insertions linked to traits like milk production, showing the importance of breed diversity in genomic studies.

## Contribution

The study introduces a cattle pangenome graph revealing thousands of novel non-reference insertions associated with economically important traits.

## Key findings

- 101,219 structural variations were identified, including 33,634 non-reference unique insertions (NRUIs).
- NRUIs were enriched in QTL regions linked to milk production and morphological traits.
- Two NRUIs in the Normande breed were specifically associated with milk and morphology traits.

## Abstract

The current cattle reference genome, derived from a single Hereford cow, does not capture the full spectrum of genetic diversity present within the species. Moreover, detecting structural variations (SVs ≥ 50 nucleotides long) remains challenging using only standard approaches of either short or long-read sequence approaches against a linear reference genome. Recent advances in long-read sequencing technologies and graph-based assembly now enable the construction of breed-specific pangenomes, revealing previously uncharacterized genomic regions that may contribute to important agricultural traits.

In this study we constructed a cattle pangenome graph using 16 high-quality haplotype-resolved genome assemblies originating from nine breeds representing the diversity of French cattle populations, and including yak (Bos grunniens) as a close outgroup species. Using a trio-based strategy combined with complementary sequencing technologies and bioinformatics methods, we identified and characterized 101,219 structural variations. Of these, 33,634 were classified as non-reference unique insertions (NRUIs), adding several megabases of novel genomic sequences absent from the current Hereford reference genome.

Analysis of the distribution of these NRUIs revealed significant genome-wide enrichment within QTL regions associated with milk production and morphological traits, suggesting their contribution to the genetic basis of economically relevant phenotypes. Furthermore, their functional annotation highlighted two NRUIs located within the intronic regions of ARMH3 and EPHA5, both specific to the Normande breed and significantly associated with milk production and morphological traits, respectively.

Our findings demonstrate the value of pangenome approaches to uncover functionally relevant SVs, particularly NRUIs, that are systematically not in the current reference genome. By linking these variants to economically important traits, our work underscores the need to incorporate breed diversity into future genomic analyses and reference-building efforts in cattle.

The online version contains supplementary material available at 10.1186/s40104-026-01373-3.

## Linked entities

- **Genes:** ARMH3 (armadillo like helical domain containing 3) [NCBI Gene 79591], EPHA5 (EPH receptor A5) [NCBI Gene 2044]
- **Species:** Bos grunniens (taxon 30521)

## Full-text entities

- **Genes:** ABO (ABO, alpha 1-3-N-acetylgalactosaminyltransferase and alpha 1-3-galactosyltransferase) [NCBI Gene 515340], MATN3 (matrilin 3) [NCBI Gene 540041], GNRHR (gonadotropin releasing hormone receptor) [NCBI Gene 281798], KIT (KIT proto-oncogene, receptor tyrosine kinase) [NCBI Gene 280832] {aka c-kit}, ASIP (agouti signaling protein) [NCBI Gene 404192], EPHA5 [NCBI Gene 538224]
- **Diseases:** NRUIs (MESH:D053591), CHA (MESH:C483999), TPM (OMIM:602482), GSD (MESH:D010262), pigmentation (MESH:D010859)
- **Chemicals:** fatty acids (MESH:D005227), ARMH3 (-), lipid (MESH:D008055), agarose (MESH:D012685)
- **Species:** Homo sapiens (human, species) [taxon 9606], Physeter macrocephalus (sperm whale, species) [taxon 9755], Tursiops truncatus (Atlantic bottlenose dolphin, species) [taxon 9739], Bos indicus (Indicine cattle, species) [taxon 9915], Capra hircus (domestic goat, species) [taxon 9925], Bos grunniens (domestic yak, species) [taxon 30521], Equus caballus (domestic horse, species) [taxon 9796], Bos taurus (bovine, species) [taxon 9913], Bos mutus (wild yak, species) [taxon 72004], Mus musculus (house mouse, species) [taxon 10090], Bison bison bison (subspecies) [taxon 43346], Balaenoptera musculus (blue whale, species) [taxon 9771], Bovidae (family) [taxon 9895], Ovis aries (domestic sheep, species) [taxon 9940], Delphinapterus leucas (beluga, species) [taxon 9749]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12969903/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12969903/full.md

## References

7 references — full list in the complete paper: https://tomesphere.com/paper/PMC12969903/full.md

---
Source: https://tomesphere.com/paper/PMC12969903