# Comparative genomic and phylogenetic analyses of Crataegus chloroplast genomes: insights for evolution and identification

**Authors:** Xinyu Sun, Mingqi Cui, Baipeng Zhao, Yu Wang, Xiao Zhang, Yuexue Liu

PMC · DOI: 10.3389/fpls.2026.1767012 · Frontiers in Plant Science · 2026-02-11

## TL;DR

This study analyzes the chloroplast genomes of 18 Crataegus and Mespilus species to better understand their evolution and improve species identification.

## Contribution

The study provides new chloroplast genome sequences and identifies key regions useful for phylogenetic analysis and species identification in Crataegus.

## Key findings

- Chloroplast genomes of 18 Crataegus and Mespilus species were sequenced and analyzed, revealing typical quadripartite structures and gene content.
- The ndhC~trnV-UAC region is identified as a potential hotspot for molecular marker development in Crataegus species.
- Phylogenetic and divergence time analyses suggest that Crataegus originated in Europe and later diversified into subgroups around 27 million years ago.

## Abstract

Crataegus spp. plants are valuable horticultural crops because of their extensive use in Chinese herbal medications, cosmetics, food production, and other industries. However, the wide variety of species, similar morphological characteristics, inherent hybridization, apomixis, and polyploidy have led to confusion in terms of their taxonomic status. Herein, a total of 18 complete chloroplast genomes including 17 Crataegus species and 1 Mespilus species were newly sequenced and comprehensively analyzed for comparative genomics and phylogenetic relationships. The 18 chloroplast genomes possessed typical quadripartite structures with lengths from 159,638 to 159,973 bp in size. These chloroplast genomes encode 119–131 genes, including 37 transfer RNA (rRNA) genes, 8 ribosomal RNA (tRNA) genes, and 74–85 protein-coding genes (PCGs). In addition, 23–54 long repeat sequences and 74–87 simple sequence repeats (SSRs) were detected. The examination of Ka/Ks ratios for 18 chloroplast genomes revealed that the rpoC2 gene was significantly positively selected. Additionally, we identified nine distinct hotspot regions (infA, ndhC, pasl, rps19, ndhC~trnV-UAC, psbZ~trnG-UCC, rpl33~rps18, trnH-GUG~psbA, and trnR-UCU~atpA), and verified that ndhC~trnV-UAC might be used as a foundation for subsequent molecular marker studies aimed at identifying Crataegus species. Maximum likelihood and Bayesian phylogenetic trees using chloroplast genome sequences consistently revealed genetic relationships among Crataegus and Mespilus species, and confirmed the taxonomic status of Crataegus accessions (GSSZ, JRY, RR2H, RR3H, ZWSZ). The results of divergence time showed that the crown age of C. subg. Crataegus was about 33.487 Ma, and then started to diverge into the C. subg. Americanae and C. subg. Sanguineae around 27.059 Ma. Based on the results of molecular evidence, we speculate that genus Crataegus originated earliest from European-derived species within C. subg. Crataegus. Biogeographic and molecular dating analyses suggested that China represented a putative maternal origin of Crataegus species. The complete chloroplast genomes of Crataegus not only enable the resolution of phylogenetic relationships within the genus but also offer novel insights into chloroplast genome structure variation and evolution. Additionally, the identified divergent DNA regions hold significant utility for species identification and phylogenetic reconstruction in Crataegus.

## Linked entities

- **Genes:** rpoC2 (RNA polymerase beta'' subunit) [NCBI Gene 800295], IFNA17 (interferon alpha 17) [NCBI Gene 3451], ndhC (NADH dehydrogenase subunit 3) [NCBI Gene 800470], RPS19 (ribosomal protein S19) [NCBI Gene 6223]

## Full-text entities

- **Genes:** ndhF [NCBI Gene 10043789], ycf3 [NCBI Gene 10043723], rpoC2 [NCBI Gene 10043706], trnG [NCBI Gene 10043719], psbZ [NCBI Gene 10043769], trnQ [NCBI Gene 10043695], rps7 [NCBI Gene 10043780], trnS [NCBI Gene 10043698], atpF [NCBI Gene 10043702], trnR [NCBI Gene 10043803], petD [NCBI Gene 10043758], ndhB [NCBI Gene 10043811], petB [NCBI Gene 10043757], trnY [NCBI Gene 10043713], trnW [NCBI Gene 10043746], psbA [NCBI Gene 10043691], rpoC1 [NCBI Gene 10043707], ndhA [NCBI Gene 10043798], trnH [NCBI Gene 10043817], rrn5 [NCBI Gene 10043773], trnM [NCBI Gene 10043771], clpP [NCBI Gene 10043816], TRNG (tRNA-Gly) [NCBI Gene 4563] {aka MTTG}
- **Diseases:** cardiovascular and cerebrovascular diseases (MESH:D002318), hypertension (MESH:D006973)
- **Chemicals:** AT (MESH:D001246), -nucleotide (MESH:D009711), xylan (MESH:D014990), Leu (MESH:D007930), polysaccharides (MESH:D011134), Met (MESH:D008715), flavonoids (MESH:D005419), Trp (MESH:D014364), GC (MESH:C057580), lipids (MESH:D008055), xylose (MESH:D014994), Ser (MESH:D012694), xylooligosaccharides (MESH:C570991), Arg (MESH:D001120), carbohydrates (MESH:D002241), amino acids (MESH:D000596), MW653326 (-)
- **Species:** Crataegus marshallii (species) [taxon 416288], Mespilus (genus) [taxon 36615], Crataegus chlorosarca (species) [taxon 416282], C. aurantia [taxon 189610], Crataegus phaenopyrum (species) [taxon 416292], Crataegus chungtienensis (species) [taxon 1961193], Crataegus kansuensis (species) [taxon 416286], Crataegus (hawthorn, genus) [taxon 23159], Crataegus rhipidophylla (species) [taxon 510738], Corynocarpus laevigatus (karaka, species) [taxon 4312], Amelanchier (genus) [taxon 23139], Crataegus hupehensis (species) [taxon 416285], Prunus maximowiczii (Korean cherry, species) [taxon 97306], Artemisia (genus) [taxon 4219], Culex mollis (species) [taxon 549331], Crataegus crus-galli (cockspur hawthorn, species) [taxon 216036], Mespilus germanica (medlar, species) [taxon 36616], Crataegus brachyacantha (species) [taxon 416280], Homo sapiens (human, species) [taxon 9606], Aletris spicata (species) [taxon 119992], Actaea dahurica (species) [taxon 64029]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12932471/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12932471/full.md

## References

92 references — full list in the complete paper: https://tomesphere.com/paper/PMC12932471/full.md

---
Source: https://tomesphere.com/paper/PMC12932471