Characterization and Phylogenetic Analysis of the Complete Mitochondrial Genome of Celaenorrhinus victor (Lepidoptera: Hesperiidae)
Yaping Hu, Site Luo, Zhentian Yan, Xiaomin Ge, Le Wang, Xu Zhou, Bin Chen, Hui Ding, Xiao Zheng

TL;DR
This study reports the complete mitochondrial genome of Celaenorrhinus victor and uses it to explore evolutionary relationships within skipper butterflies.
Contribution
The paper provides the first complete mitogenome for Celaenorrhinus victor and contributes to resolving phylogenetic relationships in Hesperiidae.
Findings
The mitogenome of C. victor is 15,180 bp with typical mitochondrial gene content and a strong A + T bias.
Codon usage in Celaenorrhinus species is similar, with A + T mutational pressure influencing gene evolution.
Phylogenetic analysis supports C. victor's placement in Celaenorrhinus but shows limited resolution for deeper hesperiid relationships.
Abstract
Skipper butterflies (Hesperiidae) are diverse, yet mitochondrial genomic resources for many genera remain limited. Here, we report the complete mitochondrial genome of Celaenorrhinus victor and compare it with available congeners. The mitogenome is 15,180 bp long and contains the typical set of 37 mitochondrial genes and a control region, with a strong A + T bias that is common in insect mitochondria. Comparative analyses across Celaenorrhinus indicate broadly similar codon-usage profiles among species, and protein-coding genes show signals consistent with predominant purifying selection. Using a dataset of 66 complete mitogenomes (65 Hesperiidae plus one outgroup), we reconstructed a mitogenome-based phylogeny that supports the placement of C. victor within Celaenorrhinus and highlights relationships among close congeners. Deeper relationships among major hesperiid lineages show only…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6- —the Biodiversity Investigation, Observation and Assessment Program of the Ministry of Ecology and Environment of China
- —the Basic Scientific Research Funds Programs in the National Public Welfare Research Institutes of China
- —The Open Foundation of Scientific Observation and Research Station for Ecological Environment of Wuyi Mountains, Ministry of Ecology and Environment, China
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Lepidoptera: Biology and Taxonomy · RNA and protein synthesis mechanisms
1. Introduction
The family Hesperiidae, commonly known as skippers, represents one of the major groups within the superfamily Papilionoidea, encompassing more than 4000 species worldwide. Members of this family are distinguished by their stout bodies, hooked antennae and rapid, skipping flight. Despite this morphological distinctiveness, the evolutionary relationships among subfamilies and genera of Hesperiidae remain controversial, owing to morphological convergence and historically limited molecular sampling [1]. Within this family, the genus Celaenorrhinus Hübner, 1819 (Lepidoptera: Hesperiidae: Pyrginae), comprises over 100 described species distributed across tropical and subtropical regions of Asia, Africa and the Americas [2]. Notably, as typical representatives of butterflies, Celaenorrhinus species also serve as sensitive bioindicators of environmental changes. For example, butterfly assemblages have been used to track responses to extreme climate events and metal contamination, highlighting their potential value for forest ecosystem monitoring [3,4]. Their population dynamics, distribution patterns, and diversity are closely associated with habitat quality, climate fluctuations, and environmental pollution, which can reflect the health status of forest ecosystems [5,6]. Exploring the molecular characteristics and phylogenetic relationships of C. victor thus not only improves our understanding of the evolutionary history of this genus but also provides a molecular basis for subsequent ecological monitoring and conservation of butterfly indicator species. However, molecular data for Celaenorrhinus are still scarce, and its phylogenetic placement within Hesperiidae remains insufficiently resolved. As of our dataset compilation (31 December 2025), only a limited number of complete mitochondrial genomes were available for Celaenorrhinus in public repositories (n = 5), restricting genus-level comparative analyses of codon usage, selective constraints, and mitochondrial phylogeny.
Mitochondrial genomes (mitogenomes) have been widely applied in studies of molecular evolution, species identification and phylogenetic reconstruction in insects [7,8,9]. Compared with nuclear genomes, mitogenomes have several advantages, including small size (~15–17 kb), conserved gene content, maternal inheritance and relatively high substitution rates [10]. These features make mitogenomes particularly suitable for resolving taxonomic ambiguities and inferring evolutionary relationships at various taxonomic levels [11,12]. In Lepidoptera, an increasing number of complete mitochondrial genomes have been published, providing insights into genome organization, codon usage, gene rearrangement and evolutionary patterns [2,13,14]. Nevertheless, within Hesperiidae—and especially within Pyrginae—mitogenomic resources remain limited, constraining our understanding of the evolutionary history and diversification of skippers.
Celaenorrhinus victor (Devyatkin, 2003) is a representative species of Celaenorrhinus widely distributed in South and Southeast Asia [13]. Despite its ecological importance and distinctive morphology, the complete mitochondrial genome of C. victor has not yet been reported. Generating and analysing its mitogenome will not only clarify the genomic characteristics and evolutionary features of the genus but also refine the phylogenetic framework of Hesperiidae [15,16,17].
However, beyond reporting individual mitogenomes, genus-level syntheses that jointly evaluate synonymous codon usage, selective constraints (Ka/Ks), and phylogenetic placement within a broader hesperiid sampling remain scarce for Celaenorrhinus. In this study, we sequenced and comprehensively characterised the complete mitochondrial genome of C. victor via Illumina short-read sequencing technology. We analysed its genome organization, base composition, codon usage and gene arrangement, and performed phylogenetic analyses based on concatenated mitochondrial PCGs from representative hesperiid species. By combining C. victor with congeneric mitogenomes, we further explored codon-usage bias and selective constraints at the genus level. Our findings provide new insights into mitogenome evolution and phylogenetic relationships within Hesperiidae and enrich the molecular resources available for the subfamily Pyrginae butterflies.
2. Materials and Methods
2.1. Sampling, Identification and Vouchering
Adult specimens of Celaenorrhinus victor (Figure 1) were collected from Xishui County, Guizhou Province, China (28.550° N, 106.510° E; elevation ca. 1200 m) on 28 August 2020 using sweep nets during daytime activity [13]. Specimens were identified by Zhentian Yan (Chongqing Normal University) based on typical diagnostic characteristics, including narrow and elongated forewings, distinct yellow-orange macules on the hindwings, and a nearly entirely white dorsal surface of the antennae in males (Figure 1). A voucher specimen (Voucher ID: GZ1200m-2) was deposited in the insect collection of Chongqing Normal University (Chongqing, China). Thoracic muscle and legs were dissected from a single adult individual and preserved in 95–100% ethanol at −20 °C until DNA extraction. In order to place the mitogenome of C. victor in a genus-level context, we also retrieved complete mitochondrial genomes of four additional Celaenorrhinus species from GenBank (Table S1).
2.2. DNA Extraction, Library Preparation and Sequencing
Total genomic DNA was extracted from thoracic muscle using the TIANamp Genomic DNA Kit (TIANGEN, Beijing, China) following the manufacturer’s instructions. DNA integrity was evaluated by 1.0% agarose gel electrophoresis, and DNA concentration was quantified using a Qubit 4.0 fluorometer (Thermo Fisher Scientific, Waltham, MA, USA). A paired-end Illumina library with an average insert size of approximately 350 bp was constructed using the TruSeq DNA PCR-Free library preparation kit (Illumina, San Diego, CA, USA) according to the manufacturer’s protocol. The library was sequenced on an Illumina NovaSeq 6000 platform (Novogene, Beijing, China) to generate 150 bp paired-end reads (PE150).
Raw reads were subjected to adapter trimming and quality filtering with fastp (v1.0.1; parameters: —qualified_quality_phred 20—length_required 140—detect_adapter_for_pe) to remove adapter-contaminated reads, reads containing more than 10% ambiguous bases and reads with excessive low-quality bases [18]. Read quality before and after filtering was summarised using FastQC (v0.12.1) [19].
2.3. Mitochondrial Genome Assembly and Quality Assessment
Filtered Illumina reads were used for de novo assembly of the C. victor mitochondrial genome. A dedicated organelle assembler for circular animal mitogenomes was employed. In this study, we used NOVOPlasty (v4.3.5) with default parameters and a cox1 barcode sequence from a congeneric hesperiid as the seed sequence [20]. The expected mitogenome size was set to 15–20 kb based on previously reported Lepidoptera mitogenomes. The initial circular contig corresponding to the mitochondrial genome was identified based on coverage depth, contig length and gene content. To confirm its mitochondrial origin and remove potential nuclear or contaminant sequences, we conducted BLASTN searches against insect mitochondrial genomes in GenBank and against a representative Lepidoptera nuclear genome [21]. Only the circular contig showing high similarity to previously published butterfly mitogenomes and containing the typical set of 37 mitochondrial genes (13 PCGs, 22 tRNAs and 2 rRNAs) was retained as the C. victor mitogenome. To further improve assembly accuracy, all filtered reads were mapped back to the circular mitogenome using BWA-MEM (v0.7.19) [22]. The consensus sequence was iteratively polished with Pilon (v1.24) to correct single-nucleotide errors and small indels [23]. Clean reads were mapped back to the assembled mitogenome, and coverage was summarized with mosdepth. The mitogenome was supported by a mean depth of 262.27×, and the circularization junction showed continuous read support in IGV. The presence of consistent read coverage across the circularization junction, without zero-depth gaps, was taken as evidence for a correctly assembled single circular mitochondrial chromosome.
2.4. Genome Annotation and Visualization
Annotation of the C. victor mitochondrial genome was performed with the MITOS2 web server (http://mitos2.bioinf.uni-leipzig.de/, accessed on 5 September 2021) using the invertebrate mitochondrial genetic code [24]. Protein-coding genes, tRNAs and rRNAs were initially predicted by MITOS2 and then manually inspected and curated. Putative PCGs were checked by BLASTX and BLASTN searches against the NCBI non-redundant protein database and published Lepidoptera mitogenomes to confirm gene identity, reading frame integrity and start/stop codons. tRNA genes were cross-validated using tRNAscan-SE (v2.0) and ARWEN, where necessary, and predicted secondary structures were examined to confirm typical cloverleaf structures [25]. rRNA genes were annotated by BLASTN using known insect mitochondrial rRNA sequences as queries.
All gene boundaries, including overlapping regions and intergenic spacers, were manually curated to ensure consistency with published hesperiid mitogenomes. Circular gene maps indicating gene positions, transcriptional orientation and GC content were generated with OGDRAW (v1.4.1) [26]. For the four additional Celaenorrhinus species, annotated mitogenomes were downloaded from GenBank and, where necessary, re-annotated using the same pipeline to ensure comparability.
2.5. Codon Usage Analysis
All 13 mitochondrial PCGs were extracted from the annotated mitogenomes of C. aspersus (MZ221159), Celaenorrhinus consanguineus (OR024665), C. maculosa (KF543077 and OR024663) and C. victor (MZ501805). Only complete coding sequences without internal stop codons and with lengths ≥ 300 bp were retained. For each species, the 13 PCGs were concatenated in-frame, and synonymous codon usage was quantified as relative synonymous codon usage (RSCU), defined as the observed frequency of a codon divided by the expected frequency under equal use of synonymous codons for the same amino acid.
RSCU values were calculated in R using the coRdon package under the invertebrate mitochondrial genetic code. For each codon family, RSCU > 1 was taken to indicate preferentially used codons, whereas RSCU < 1 indicated under-represented codons. To obtain a genus-level view, we merged the RSCU tables of the five Celaenorrhinus mitogenomes and calculated, for each codon, the mean, standard deviation and range of RSCU values across species, as well as the number of preferred codons ending in A/U versus G/C at the third codon position in each species. The resulting RSCU profiles were summarised per amino acid family and compared with published Lepidoptera mitogenome datasets.
2.6. Nucleotide Composition and Codon-Usage Analyses
To characterise base composition, we calculated the overall A, T, G and C contents and the proportions of A + T and G + C for the whole mitogenome; for PCGs, rRNAs and tRNAs; and for the non-coding A + T-rich region. AT-skew and GC-skew were computed as (A − T)/(A + T) and (G − C)/(G + C), respectively, for different genomic partitions.
For codon-usage statistics, we extracted nucleotide sequences of the 13 standard mitochondrial PCGs from C. victor and the four congeneric mitogenomes. Start and stop codons were removed, and sequences with internal stop codons or ambiguous nucleotides were excluded. Codon statistics were calculated in R using coRdon, including codon counts per gene, GC content at the first, second and third codon positions (GC1, GC2 and GC3) and the effective number of codons (ENC). RSCU values were computed for each gene and summarised at the species level to visualise codon preferences across Celaenorrhinus. Heatmaps of RSCU values were drawn in R using pheatmap and ggplot2 [27]. Amino acid composition and start/stop codon usage were also summarised across the 13 PCGs for each species.
2.7. ENC–GC3s, Neutrality and Parity Rule 2 (PR2) Analyses
To evaluate the determinants of codon-usage bias in Celaenorrhinus, we combined ENC–GC3s, neutrality and PR2 analyses. For each PCG in each species, observed ENC values were plotted against GC content at synonymous third codon positions (GC3s).
Neutrality plots were constructed by plotting average GC content at the first and second codon positions (GC12 = (GC1 + GC2)/2) against GC3 for each gene. Linear regressions of GC12 on GC3 were fitted for each species, and the regression slope, intercept and coefficient of determination (R^2^) were used to infer the relative contributions of mutation pressure and selection/constraint to codon-usage patterns.
PR2 bias plots were generated to evaluate strand-specific and compositional asymmetries at third codon positions. For each PCG, we calculated A3/(A3 + T3) and G3/(G3 + C3), where A3, T3, G3 and C3 denote the frequencies of the corresponding nucleotides at third codon positions. Deviations from the central point (0.5, 0.5) indicate unequal usage of complementary base pairs (A vs. T, G vs. C). All plots and regressions were produced in R using custom scripts, and summary statistics (mean ENC, GC3, GC12 and PR2 indices) were compared among the five Celaenorrhinus species.
2.8. Substitution-Rate (Ka/Ks) Analysis of Mitochondrial Protein-Coding Genes
To assess selective pressure on mitochondrial PCGs in Celaenorrhinus, we calculated pairwise nonsynonymous (Ka) and synonymous (Ks) substitution rates among the five mitogenomes (C. victor, C. maculosa, C. consanguineus and C. aspersus). For each of the 13 canonical mitochondrial PCGs (atp6, atp8, cox1–3, cytb, nad1–6 and nad4L), CDSs were extracted and saved as gene-wise FASTA files. Before alignment, CDSs were trimmed at the 3′ end so that their lengths were exact multiples of three, thereby removing incomplete stop codons.
Codon-based multiple sequence alignments were generated in R using the function AlignTranslation in DECIPHER under the invertebrate mitochondrial genetic code (NCBI code 5). CDSs that failed translation or contained internal stop codons under this code were excluded from downstream analyses. For each aligned gene, pairwise Ka and Ks values were estimated with the kaks function in the R package seqinr, based on the Nei–Gojobori method with Jukes–Cantor correction. For every gene, we summarised the number of species pairs, mean Ka, mean Ks and mean Ka/Ks (ω) across all valid pairwise comparisons, together with standard deviations. Gene-wise Ka/Ks values (bar plots with standard deviation) and species-by-species heatmaps for selected genes (e.g., atp6) were visualised in R using ggplot2.
2.9. Phylogenetic Analysis Based on Mitochondrial Protein-Coding Genes
To place C. victor within Hesperiidae, we built a mitogenome dataset designed to maximize phylogenetic context while maintaining data comparability: 65 skipper species were selected to cover multiple genera and major subfamilies/tribes of Hesperiidae, and only complete mitogenomes with comparable annotation standards were included (Table S1). The tree was rooted using Junonia intermedia (Nymphalidae) as an outgroup, consistent with its established use for rooting Papilionoidea mitogenome phylogenies. For each species, the 13 mitochondrial PCGs were extracted and saved as separate FASTA files. Nucleotide sequences of each PCG were aligned using MAFFT v7 with the auto option, running alignments in parallel across loci with GNU Parallel (16 threads) to improve efficiency. Obvious alignment ambiguities and poorly aligned positions were removed gene by gene using trimAl with the -automated1 heuristic.
The trimmed single-gene alignments were concatenated into a supermatrix with AMAS, treating each PCG as an independent partition and exporting both the concatenated alignment (mito_CDS_trim.fa) and a corresponding partition file. The AMAS partition file was converted into a NEXUS “charset” block (partitions.nxs) and supplied to IQ-TREE2 for partitioned maximum-likelihood (ML) analysis [26]. ML trees were inferred under a scheme where the best-fit substitution model for each partition and partition merging were selected with ModelFinder (-m MFP + MERGE) [28,29]. Node support was assessed using standard nonparametric bootstrap (BS; 1000 replicates; -b 1000). The final topology was the majority-rule consensus tree generated in IQ-TREE2 (-contree), with node labels indicating bootstrap support values (%). The tree was rooted with Junonia intermedia during visualization in FigTree v1.4.4, and the figure was edited accordingly.
3. Results
3.1. Genomic Organization and Nucleotide Composition
The complete mitochondrial genome of C. victor is a circular double-stranded DNA molecule of 15,180 bp in length (Figure 2), which falls within the size range reported for other hesperiid butterflies. It contains the typical set of 37 mitochondrial genes—13 PCGs, 22 tRNAs and 2 rRNAs—together with a non-coding A + T-rich control region (Figure 2). No gene rearrangements were detected, and gene order and transcriptional orientation are identical to those of previously published Celaenorrhinus and other hesperiid mitogenomes, indicating a highly conserved mitochondrial architecture within the group. The 22 tRNA genes range in length from 69 bp (tRNASer) to 72 bp (tRNALys and tRNAHis). As in other Lepidoptera, the control region is located between the 12S rRNA gene and tRNAMet and is the most A + T-rich segment of the mitogenome. The overall nucleotide composition of the C. victor mitogenome is strongly biased towards adenine and thymine, with base frequencies of 39.68% A, 39.95% T, 7.73% C and 12.63% G, corresponding to an A + T content of 79.64%. This pronounced A + T bias is comparable to that of other Celaenorrhinus species and is typical of Lepidoptera mitochondrial genomes. Consistent with other skippers, the mitogenome exhibits a slightly negative AT-skew and strongly negative GC-skew, reflecting an excess of T over A and of C over G across most genomic partitions.
All 13 PCGs of C. victor employ standard invertebrate mitochondrial initiation codons. Five genes (atp6, cox2, cox3, cytb and nad4) start with the canonical ATG, four genes (atp8, nad2, nad3 and nad5) use ATT, and three genes (nad1, nad4L and nad6) initiate with ATA, whereas cox1 uses TTG as an alternative start codon, a feature frequently reported in Lepidoptera mitogenomes. All PCGs terminate with the complete stop codon TAA, and no truncated stop codons (T−/TA−) were detected in C. victor. This start/stop codon pattern closely matches that of congeneric species and other hesperiid mitogenomes, highlighting the conserved nature of mitochondrial translation signals within Celaenorrhinus and across skippers more broadly.
3.2. Synonymous Codon Usage Bias Across Celaenorrhinus Mitogenomes
Across the five Celaenorrhinus mitogenomes, RSCU statistics were obtained for 62 sense codons (Figure 3). Each species possessed 27–28 preferred codons with RSCU > 1, and all preferred codons ended in A or U at the third codon position. In contrast, none of the G- or C-ending codons had a mean RSCU > 1 across species, and 27 G/C-ending codons were strongly suppressed with an average RSCU < 0.30. At the genus level, A/U-ending codons showed high average RSCU values (mean ≈ 1.82), whereas G/C-ending codons were rarely used (mean ≈ 0.18).
The most strongly preferred codon in all five species was UUA (Leu2), with RSCU values ranging from 4.76 to 5.00 (mean 4.90 ± 0.11). Other highly over-represented codons were UCU and UCA for Ser2 (mean RSCU 2.86 and 1.92, respectively), GCU for Ala (2.38 ± 0.07), CGA for Arg (2.40 ± 0.10), ACU for Thr (2.40 ± 0.11), GUU for Val (2.22 ± 0.08), GGA for Gly (2.38 ± 0.16) and CCU for Pro (2.30 ± 0.34). Canonical A/U-rich codons for frequently used amino acids—including AUU (Ile; mean RSCU 1.87), UUU (Phe; 1.84), AUA (Met; 1.81) and AAU (Asn; 1.82)—were also consistently preferred in all five species. The set and rank order of preferred codons were highly conserved among the five Celaenorrhinus mitogenomes. With the exception of a borderline codon pair (CCA vs. GGU) whose RSCU values fluctuated around 1.0, every species preferred essentially the same subset of A/U-ending codons for each amino acid family. No species-specific codon preference pattern was detected, indicating highly similar synonymous codon-usage profiles across species at the genus level.
3.3. Codon Usage Bias of Celaenorrhinus Mitochondrial Protein-Coding Genes
To place the C. victor mitogenome into a genus-level context, we compared codon usage across the five Celaenorrhinus species. At the species level, the mean effective number of codons (ENC) ranged from 31.21 to 33.24, with a genus-wide average of 31.94 ± 0.89, indicating a generally strong codon-usage bias in Celaenorrhinus mitochondrial PCGs. Among the 13 PCGs, nad4L (ENC ≈ 27.3) and nad5 (ENC ≈ 29.4) exhibited the strongest bias, whereas cox1 (ENC ≈ 34.4) and cox3 (ENC ≈ 35.2) showed comparatively weaker bias. Overall, 86.7% (52/60) of all CDS had ENC values below 35 (Figure 4A). The third codon positions were extremely A + T-rich in all species. Mean GC content at third positions (GC3) varied only slightly among species, from 0.078 to 0.103 (overall mean 0.087 ± 0.010), and remained much lower than GC1 (≈0.27) and GC2 (≈0.27). C. victor showed the highest GC3 (≈0.103) together with the highest ENC (≈33.24), suggesting a slightly relaxed codon bias and modestly elevated GC3 relative to congeners, but still well within the genus-level pattern of strong A + T bias.
In the ENC–GC3s plot, almost all genes fell below Wright’s expected curve. Across all CDS, the expected ENC under a purely mutational model averaged 36.6, whereas the observed ENC averaged 31.9, and 96.7% (58/60) of points lay below the curve. In each species, at least 11 of 12 genes were below the curve, indicating that factors beyond neutral mutational pressure—such as selection or translational/structural constraints—contribute to shaping codon usage. ENC and GC3 were moderately positively correlated (r ≈ 0.39), meaning that genes with slightly higher GC3 tend to show somewhat weaker codon bias.
Neutrality plots (GC12 vs. GC3) provided further insight. Regression slopes between GC3 and GC12 ranged from 1.34 to 1.98 across species (mean 1.65), but the goodness of fit was low (R^2^ = 0.07–0.32; mean ≈ 0.19). Although GC3 and GC12 were positively correlated (r ≈ 0.40), mutation pressure at third positions explained only a modest proportion of the variation in first- and second-position GC content, implying that natural selection and/or structural constraints on amino acid composition play a substantial role (Figure 4B).
PR2 bias plots consistently deviated from the central point (A3 = T3, G3 = C3). At the genus level, mean G3/(G3 + C3) was ~0.35, indicating a clear excess of C over G at third positions, and mean A3/(A3 + T3) was ~0.43, reflecting a preference for T over A. Together with the low GC3 values, this pattern demonstrates a pronounced bias towards pyrimidine-ending (C- and especially T-ending) codons at third positions, a feature that is highly conserved among all sampled Celaenorrhinus species (Figure 4C).
3.4. Substitution-Rate Patterns Among Celaenorrhinus Mitochondrial Genes
Across all 13 mitochondrial PCGs, mean Ka/Ks ratios were consistently < 1, with gene-wise averages ranging approximately from ~0.02 in cox1 to ~0.17 in atp8, nad4L, nad5 and nad6 (Figure 5A). Cytochrome oxidase genes (cox1–3) and cytb showed the lowest Ka/Ks values, indicating strong functional constraint on these components of the respiratory chain, whereas several NADH dehydrogenase subunits (nad1, nad2, nad4, nad4L, nad5, nad6) and atp8 exhibited comparatively elevated Ka/Ks values, suggesting faster amino-acid evolution in these genes. Nevertheless, even the most rapidly evolving genes remained well below the threshold of ω = 1, implying overall purifying selection.
The atp6 Ka/Ks heatmap (Figure 5B) further illustrates these patterns at the species level. The two accessions of C. maculosa showed the lowest Ka/Ks values, as expected for conspecific genomes, while interspecific comparisons among C. victor, C. maculosa, C. consanguineus and C. aspersus displayed moderately higher but still relatively low Ka/Ks values (~0.06–0.12). This indicates limited amino-acid divergence among Celaenorrhinus species and broadly similar selective regimes across lineages.
3.5. Phylogenetic Placement of Celaenorrhinus victor Within Hesperiidae
The maximum-likelihood (ML) analysis of the concatenated mitochondrial PCGs recovered a rooted hesperiid phylogeny using J. intermedia (Nymphalidae) as the outgroup (Figure 6). Major subfamily- and tribe-level clades within Hesperiidae were recovered, including Coeliadinae (e.g., Burara, Hasora, and Choaspes) and a distinct lineage represented by Euschemon rafflesia near the base of the ingroup. Within Pyrginae s.l., multiple tribe-level groupings were resolved, including a Tagiadini clade comprising Abraximorpha, Darpa, Tagiades, Capila, Coladenia, Mooreana, Pintara, Gerosis, Tapena, and Satarupa, as well as an Erynnis + Pyrgus grouping (Erynnini + Pyrgini) and a cluster containing Carterocephalus, Heteropterus, Leptalina, and Malaza. Hesperiinae taxa (e.g., Ampittia, Pedesta, Sovia, Notocrypta, Parnara, Pelopidas, Potanthus, and Zinaida) formed a separate radiation (Figure 6).
Celaenorrhinini was represented by Aurivittia aurivittata, four Celaenorrhinus species, and two Pseudocoladenia species. In the inferred topology, A. aurivittata was placed basally to a clade comprising Celaenorrhinus and Pseudocoladenia. Within the Celaenorrhinus–Pseudocoladenia assemblage, Pseudocoladenia dea and Pseudocoladenia festa formed a sister pair, which was recovered as sister to a monophyletic Celaenorrhinus clade (Figure 6).
4. Discussion
4.1. General Features of the C. victor Mitogenome
The overall architecture of the C. victor mitogenome reinforces the view that skipper mitochondrial genomes are highly conserved in both gene content and arrangement [16]. Its compact circular molecule, standard complement of 37 genes and single A + T-rich control region mirror the organisation reported for other hesperiid and lepidopteran mitogenomes, suggesting that large-scale rearrangements have played little role in the evolution of Celaenorrhinus [14]. Likewise, the pronounced A + T bias and characteristic compositional skews, together with the use of conventional ATN/TTG start codons and complete TAA stops, point to long-term stability of replication, transcription and translation signals in this lineage [30,31,32]. Rather than revealing unusual or derived structural features, C. victor therefore exemplifies the “canonical” hesperiid mitogenome, which is advantageous for comparative work: it allows us to interpret differences in codon usage, substitution rates and phylogenetic signal in subsequent sections as genuine evolutionary variation among species and genes, rather than as artefacts of gene rearrangement or atypical genome organisation.
4.2. Conserved, A + T-Biased Codon Usage at the Genus Level
The RSCU profiles of the five Celaenorrhinus mitogenomes reveal an extremely conserved and strongly A + T-biased codon-usage pattern at the genus level. All preferred codons end in A or U, whereas all markedly under-represented codons end in G or C. This pattern mirrors the very high A + T content and A/T-skew reported for many Lepidoptera mitogenomes, where NNA and NNU codons dominate and UUA (Leu), UUU (Phe), AUU (Ile), AUA (Met) and AAU (Asn) typically show the highest RSCU values [33,34,35]. Our finding that UUA (Leu2) has by far the highest RSCU (≈4.8–5.0) in every Celaenorrhinus species is fully consistent with mitogenomic surveys in moths and butterflies, including Hesperiidae and other Ditrysia lineages, in which UUA is repeatedly identified as the most over-represented codon in mitochondrial PCGs [14,36].
The near-identity of preferred codon sets across C. aspersus, C. consanguineus, C. maculosa (two accessions) and C. victor suggests that synonymous codon usage has been evolutionarily stable within Celaenorrhinus, despite species-level divergence and differing collection localities. This stability implies that the balance between mutational pressure and selection on mitochondrial translation efficiency is very similar across the genus. In Lepidoptera mitogenomes, codon-usage bias has been shown to correlate strongly with nucleotide composition—especially GC content at third codon positions—indicating that mutation pressure towards A + T is a predominant driver, with translational selection playing a secondary but detectable role in fine-tuning preferred codons [35]. The strong enrichment of A/U-ending codons and suppression of G/C-ending codons that we observe in Celaenorrhinus fit this general pattern and are consistent with our ENC–GC3s and neutrality analyses, which likewise point to mutation-dominated codon bias with limited selection.
From a broader phylogenetic perspective, the genus-level RSCU pattern of Celaenorrhinus closely matches those reported for other skipper butterflies and lepidopteran clades, reinforcing the view that mitochondrial codon usage is a conservative trait at higher taxonomic levels. At the same time, the quantitative RSCU estimates provided here offer a useful reference for future comparative work on hesperiid mitogenomes [13,14]. Because all five species share nearly identical codon preferences, deviations from this pattern in additional Celaenorrhinus or related genera could signal shifts in underlying mutational regimes, changes in tRNA gene content or lineage-specific selection on mitochondrial translation. In this sense, our study provides the first genus-wide codon-usage baseline for Celaenorrhinus, which can be integrated with phylogenetic and population-level analyses to explore the evolutionary forces shaping skipper mitogenomes.
4.3. Genus-Level Codon Usage Patterns and Evolutionary Implications
Our comparative analysis indicates that Celaenorrhinus mitogenomes exhibit broadly similar codon-usage profiles, with strong overall codon bias (mean ENC ≈ 32) and pronounced A + T enrichment at third codon positions (mean GC3 ≈ 0.09). These values are consistent with the typical A + T bias of insect mitochondrial genomes, and the genus-level comparison further shows that ENC and GC3 vary within a relatively narrow range among the sampled Celaenorrhinus species [37,38]. Overall, this pattern suggests that the compositional background and codon-bias magnitude are comparable across congeners in our dataset. [39,40].
Within this distribution, C. victor shows slightly higher GC3 and ENC values than some congeners, indicating a modestly reduced third-position A + T enrichment and weaker codon bias. Importantly, however, its values remain within the observed genus-level range rather than representing an extreme outlier, implying that C. victor conforms to the overall compositional and codon-usage characteristics seen in other Celaenorrhinus mitogenomes [41]. Because such concordance is expected for correctly assembled and annotated mitochondrial PCGs, this comparison is consistent with (but does not by itself prove) typical mitogenome composition for C. victor; assembly reliability is further supported by mapping/coverage-based metrics reported in Methods (Section 2.3).
To explore potential drivers of codon bias, we examined ENC–GC3s, neutrality, and PR2 patterns. In the ENC–GC3s plot, most genes fall below Wright’s expected curve, indicating that variation in GC3 alone is insufficient to explain the observed codon bias under a purely mutational model [42]. Neutrality plots show positive but weak GC3–GC12 relationships, consistent with mutation affecting all codon positions while first and second positions are more constrained, likely reflecting amino-acid and functional constraints in protein-coding genes [43]. PR2 plots show persistent deviations from parity (e.g., C over G and T over A at third codon positions), consistent with strand- and replication-associated asymmetries that have been widely reported in insect mitochondrial genomes; these deviations may also be compatible with additional effects such as differential constraints among synonymous codons [44].
The genus-level similarity in codon-usage indices provides a comparative reference for Celaenorrhinus mitogenomes and highlights gene-to-gene heterogeneity in bias (e.g., stronger bias in nad4L/nad5 versus weaker bias in several cox genes). Future analyses incorporating more Celaenorrhinus taxa and nuclear loci would help test whether the mitochondrial codon-usage patterns observed here parallel nuclear-genome trends and whether among-lineage shifts—if present—associate with ecological or demographic variables.
4.4. Selective Constraints on Celaenorrhinus Mitochondrial Protein-Coding Genes
The Ka/Ks analysis demonstrates that all 13 mitochondrial PCGs in Celaenorrhinus are evolving under pervasive purifying selection, as none of the genes or species pairs approached a Ka/Ks ratio of 1. This pattern mirrors findings from other lepidopteran groups, where mitochondrial genes typically exhibit Ka/Ks < 1, reflecting strong functional constraints on oxidative phosphorylation complexes [45,46]. In our dataset, cox1 and cox2 are the most conserved genes, reinforcing their suitability as DNA barcoding markers and reliable loci for species-level phylogenetics [2,47]. By contrast, atp8 and several NADH dehydrogenase genes (nad4, nad4L, nad5, nad6) show comparatively higher Ka/Ks ratios, a pattern also reported in other butterflies and moths, where these loci tend to evolve faster than cytochrome oxidase genes [48].
Although their Ka/Ks values remain below 1, the relative elevation suggests slightly relaxed purifying selection or lineage-specific functional tuning on parts of the electron transport chain. Such genes may provide more informative variation for population-level or phylogeographic studies within Celaenorrhinus, complementing the highly conserved cox genes. The genus-wide comparison reveals a consistent hierarchy of selective constraints among mitochondrial genes—strongest on cox1–3 and cytb, and more relaxed on atp8 and NADH dehydrogenase subunits—highlighting both the functional conservation of Celaenorrhinus mitogenomes and the potential of specific PCGs as evolutionary markers at different taxonomic scales [40,49].
4.5. Phylogenetic Implications for Celaenorrhinini and C. victor
Our mitogenome-based phylogeny supports several focal relationships relevant to Celaenorrhinini and the placement of C. victor (Figure 6). At the tribe level, Celaenorrhinini is recovered as a clade comprising Aurivittia, Celaenorrhinus, and Pseudocoladenia, with Aurivittia occupying a basal position and the Celaenorrhinus–Pseudocoladenia assemblage forming the crown group. This pattern is consistent with recent studies integrating mitochondrial genomes with nuclear loci, including work that recovered Aurivittia as the earliest-diverging lineage within Celaenorrhinini and supported the placement of Pseudocoladenia within Celaenorrhinini rather than Tagiadini [15].
Within Celaenorrhinus, the inferred topology suggests two lineages in our sampling: one grouping C. aspersus with C. maculosa, and another comprising C. consanguineus and the newly sequenced C. victor. The affinity between C. aspersus and C. maculosa is concordant with previous mitogenomic studies of Celaenorrhinus and related taxa [41]. Likewise, the sister relationship between C. consanguineus and C. victor provides a working hypothesis for their close evolutionary relationship and offers a molecular framework for future re-examination of diagnostic characters and distributional boundaries, although detailed morphological revision is beyond the scope of the present study [13]. At broader phylogenetic scales, several deeper nodes linking major lineages received only moderate support in our mitogenome-only analysis (approximately 50–70 bootstrap support for some backbone relationships). Similar patterns of limited resolution at deeper divergences have been noted in other mitochondrial phylogenies of Hesperiida [12]. Such uncertainty may reflect a combination of factors, including uneven lineage sampling and limited phylogenetic signal for older splits in mitochondrial datasets. Accordingly, while our results robustly support (i) the monophyly of Celaenorrhinini, (ii) the internal structure recovered for Celaenorrhinus in the present sampling, and (iii) the sister relationship between C. consanguineus and C. victor, higher-level relationships among subfamilies and tribes within Hesperiidae should be interpreted cautiously. Future work combining denser taxon sampling with expanded nuclear loci will be important for testing the stability of these deeper nodes and refining the backbone phylogeny of skippers.
5. Conclusions
We assembled and annotated the complete mitochondrial genome of C. victor, which exhibits the canonical gene complement and compositional bias typical of insect mitogenomes. Read mapping provided high overall assembly support. Comparative analyses across available Celaenorrhinus mitogenomes indicate broadly similar codon-usage profiles at the genus level and Ka/Ks patterns consistent with predominant purifying selection on mitochondrial protein-coding genes. Mitogenome-based phylogenetic inference supports the placement of C. victor within a monophyletic Celaenorrhinus clade and recovers a sister relationship between C. victor and C. consanguineus in our sampling. However, several deeper relationships among major hesperiid lineages show only moderate support, suggesting that denser taxon sampling and nuclear loci will be important for refining the backbone phylogeny. Overall, our results provide an additional mitochondrial genomic resource for Celaenorrhinus and a reference framework for future comparative and integrative evolutionary analyses.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1HeikkiläM. Kaila L. Mutanen M. Peña C. Wahlberg N. Cretaceous Origin and Repeated Tertiary Diversification of the Redefined Butterflies Proc. R. Soc. B Biol. Sci.20112791093109910.1098/rspb.2011.1430 PMC 326713621920981 · doi ↗ · pubmed ↗
- 2Zhang X. Liu J. Chiba H. Li Y. Yuan X. Phylogeny and Evolutionary Timescales of the Tribes Tagiadini and Celaenorrhinini (Hesperiidae, Pyrginae) Inferred from Mitochondrial Genome and Nuclear Genes Zool. Scr.20245363264910.1111/zsc.12674 · doi ↗
- 3Shan B. De Baets B. Verhoest N.E. Butterfly Abundance Changes in England Are Well Associated with Extreme Climate Events Sci. Total Environ.202495417631810.1016/j.scitotenv.2024.17631839326748 · doi ↗ · pubmed ↗
- 4Pallottini M. Goretti E. Argenti C. La Porta G. Tositti L. Dinelli E. Moroni B. Petroselli C. Gravina P. Selvaggi R. Butterflies as Bioindicators of Metal Contamination Environ. Sci. Pollut. Res.202330956069562010.1007/s 11356-023-28930-x 37552448 PMC 10482766 · doi ↗ · pubmed ↗
- 5Tang M. Tan M. Meng G. Yang S. Su X. Liu S. Song W. Li Y. Wu Q. Zhang A. Multiplex Sequencing of Pooled Mitochondrial Genomes—A Crucial Step toward Biodiversity Analysis Using Mito-Metagenomics Nucleic Acids Res.201442 e 16610.1093/nar/gku 917PMC 426766725294837 · doi ↗ · pubmed ↗
- 6Timmermans M.J.T.N. Lees D.C. Simonsen T.J. Towards a Mitogenomic Phylogeny of Lepidoptera Mol. Phylogenet. Evol.20147916917810.1016/j.ympev.2014.05.03124910155 · doi ↗ · pubmed ↗
- 7Cameron S.L. Insect Mitochondrial Genomics: Implications for Evolution and Phylogeny Annu. Rev. Entomol.2014599511710.1146/annurev-ento-011613-16200724160435 · doi ↗ · pubmed ↗
- 8Boore J.L. Animal Mitochondrial Genomes Nucleic Acids Res.1999271767178010.1093/nar/27.8.176710101183 PMC 148383 · doi ↗ · pubmed ↗
