Cryptocercus Genomes Expand Knowledge of Adaptations to Xylophagy and Termite Sociality
Alun R C Jones, Alina A Mikhailova, Cédric Aumont, Juliette Berger, Cong Liu, Shulin He, Zongqing Wang, Sylke Winkler, Erich Bornberg-Bauer, Frédéric Legendre, Dino P McMahon, Mark C Harrison

TL;DR
This study uses genomes of Cryptocercus cockroaches to explore how wood-eating and social behaviors evolved, revealing new insights into the genetic changes linked to these traits.
Contribution
The study provides new genomic evidence challenging assumptions about the genetic basis of sociality and xylophagy in Blattodea.
Findings
Relaxed selection is observed in Cryptocercus and termites, suggesting reduced effective population size in subsocial ancestors.
Cryptocercus shows elevated dN/dS ratios, contradicting the expected correlation with social complexity.
A shift in methylation patterns and reduction in Ionotropic Receptors occurred in a common ancestor of Cryptocercus and termites.
Abstract
Subsociality and wood-eating or xylophagy are understood as key drivers in the evolution of eusociality in Blattodea (cockroaches and termites), two features observed in the cockroach genus Cryptocercus, the sister group of all termites. We analyze two high-quality genomes from this genus, C. punctulatus from North America and C. meridianus from Southeast Asia, to explore the evolutionary transitions to xylophagy and subsociality within Blattodea. Our analyses reveal evidence of relaxed selection in both Cryptocercus and termites, indicating that a reduction in effective population size may have occurred in their subsocial ancestors. These findings challenge the expected positive correlation between dN/dS ratios and social complexity, as Cryptocercus exhibits elevated dN/dS values that may exceed those of eusocial termites. Additionally, we infer a reduction in the number of Ionotropic…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Fig. 1
Fig. 2
Fig. 3
Fig. 4| Species | GenBank | Sequencing Technology | Length | GC content | # Scaffolds | N50 | L90 | Genome Completeness |
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
|
|
| ONT, Illumina, Hi-C | 820 Mb | 38.1% | 1288 | 70.22 Mb | 12 | S:99.2%,D:0.4%F:0.2%,M:0.2% |
|
|
| ONT, Illumina, Hi-C | 1189 Mb | 37.9% | 1490 | 45.68 Mb | 23 | S:98.5%,D:1.2%F:0.3%,M:0.1% |
|
| ||||||||
|
|
| Illumina | 1796 Mb | 33.9% | 28,065 | 0.56 Mb | 3635 | S:94.9%,D:0.8%F:3.4%,M:0.9% |
|
|
| ONT | 3396 Mb | 36.7% | 121,120 | 0.04 Mb | 81,580 | S:21.9%,D:0.3%F:32.6%,M:44.8% |
|
|
| PB | 3128 Mb | 34.8% | 10,699 | 1.41 Mb | 3838 | S:93.7%,D:4.5%F:1.0%,M:0.8% |
|
|
| PB, Hi-C | 2088 Mb | 35.4% | 479 | 194.85 Mb | 10 | S:98.5%,D:0.8%F:0.4%,M:0.4% |
|
|
| PB, Illumina | 1487 Mb | 35.1% | 120,496 | 0.04 Mb | 54,940 | S:71.9%,D:0.5%F:20.3%,M:7.1% |
|
|
| ONT, Illumina, Omni-C | 2217 Mb | 34.7% | 41 | 151.22 Mb | 14 | S:98.4%,D:0.9%F:0.4%,M:0.4% |
|
|
| ONT, Illumina, Omni-C | 2671 Mb | 34.4% | 91 | 202.73 Mb | 12 | S:98.2%,D:0.7%F:0.6%,M:0.6% |
|
|
| ONT, Illumina, Omni-C | 2585 Mb | 34.2% | 68 | 199.88 Mb | 12 | S:97.6%,D:0.8%F:0.7%,M:0.9% |
|
|
| Illumina | 1467 Mb | 35.4% | 347,079 | 0.01 Mb | 206,497 | S:34.6%,D:0.6%F:41.0%,M:23.1% |
|
|
| PB, Illumina | 1776 Mb | 35.9% | 254,758 | 0.14 Mb | 94,833 | S:76.3%,D:1.0%F:13.8%,M:8.9% |
|
|
| PB, Illumina | 3232 Mb | 35.4% | 92 | 188.09 Mb | 16 | S:98.8%,D:0.7%F:0.5%,M:0.1% |
|
| ||||||||
|
|
| Illumina | 876 Mb | 39.4% | 12,995 | 1.42 Mb | 853 | S:96.3%,D:1.2%F:1.7%,M:0.8% |
|
|
| Illumina | 1019 Mb | 40.3% | 55,483 | 1.18 Mb | 1006 | S:96.8%,D:0.7%F:1.8%,M:0.6% |
|
| doi | Illumina | 1172 Mb | 37.9% | 145,794 | 2.00 Mb | 653 | S:98.2%,D:0.2%F:1.0%,M:0.6% |
|
|
| Illumina | 811 Mb | 38.7% | 63,274 | 0.04 Mb | 31,995 | S:67.3%,D:0.2%F:24.1%,M:8.1% |
|
|
| PB | 881 Mb | 39.9% | 427 | 44.36 Mb | 19 | S:93.3%,D:0.3%F:0.3%,M:6.2% |
|
|
| Illumina | 881 Mb | 36.8% | 5817 | 1.97 Mb | 616 | S:98.2%,D:0.9%F:0.3%,M:0.6% |
|
|
| Illumina | 485 Mb | 36.5% | 31,662 | 0.75 Mb | 695 | S:98.2%,D:0.4%F:1.2%,M:0.3% |
- —DFG10.13039/501100001659
- —French National Research Agency10.13039/501100001665
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInsect and Arachnid Ecology and Behavior · Fossil Insects in Amber · Neurobiology and Insect Physiology Research
Introduction
Subsociality, in which one or both parents care for the brood (Wilson 1971; Tallamy and Wood 1986), and xylophagy, the ability to feed on wood, have evolved multiple times within cockroaches (Blattodea) (Legendre and Grandcolas 2018). For example, subsociality has emerged several times within the Blaberidae family, while xylophagy is found in the Nyctiboridae, Blaberidae, and Tryonicidae (Grandcolas 1993; Nalepa and Bell 1997). These two traits sometimes co-occur, at least within the genus Salganea (Blaberidae) and in Cryptocercus (Cryptocercidae) (Nalepa and Bell 1997). The transition to eusociality, on the other hand, which involves alloparental care by nonreproducing relatives, has only occurred once in Blattodea: the termites (Termitoidae). Since Cryptocercus is the sister group to all termites (Lo et al. 2000; Inward et al. 2007; Legendre et al. 2008), it has been suggested that wood-feeding cockroaches that provide biparental care to their offspring represents the ancestral state of termites (Nalepa 2010). Obtaining genomic data for Cryptocercus is therefore crucial for studying the molecular evolution of subsociality and xylophagy in cockroaches, as well as for understanding the circumstances under which the transition from subsociality to eusociality occurred within Blattodea (Legendre and Grandcolas 2018).
Transitions to xylophagy involve complex adaptations necessary for the effective digestion of wood (Dillard and Benbow 2020). This dietary shift frequently requires specific microbial communities and a supportive host gut environment for newly acquired (typically obligate) beneficial symbionts (Engel and Moran 2013). In contrast to other cockroaches, both lower termites and Cryptocercus have obligate protist endosymbionts that are transferred to offspring via anal trophallaxis (Klass et al. 2008; Brune and Dietrich 2015). Investigating how this dependence on gut symbionts has shaped the genomes of Cryptocercus species would provide valuable insight into the evolution of termite-microbial coexistence.
Furthermore, comparing the genomes of Cryptocercus with those of eusocial termites and other cockroaches is necessary to investigate the molecular signatures related to the emergence of subsociality and how they differ from mechanisms related to more complex social phenotypes. Previous studies have found sophisticaed chemical communication to be important for the evolution of eusociality in Hymenoptera, evidenced by high genomic copy numbers of odorant receptors (Simola et al. 2013; Kapheim et al. 2015; Shell et al. 2021), although this relationship between sociality and odorant receptor repertoires has been disputed (Gautam et al. 2024). In termites, on the other hand, ionotropic receptors were predicted to be important for colony communication due to their caste-specific expression profiles, signals of positive selection, and high copy number, albeit reduced compared to the German cockroach, Blattella germanica (Harrison et al. 2018). Furthermore, in contrast to eusocial evolution in Hymenoptera (Shell et al. 2021), termite proteomes reveal significant gene family contractions compared with B. germanica (Harrison et al. 2018) and reductions in regulatory complexity (Jones et al. 2024). This is also accompanied by an overall reduction in genome size (Terrapon et al. 2014; Harrison et al. 2018). Across many origins of eusociality, major genomic changes in transcriptional regulation have been detected compared to noneusocial relatives (Simola et al. 2013; Kapheim et al. 2015; Harrison et al. 2018; Shell et al. 2021). However, shifts in expression patterns are also likely to facilitate the evolution of parental care in subsocial species (Thorne 1997; Johnson and Linksvayer 2010; Rehan and Toth 2015), suggesting that the aforementioned regulatory changes may have undergone evolutionary changes prior to the evolution of eusociality.
Eusociality, where only few individuals reproduce, has been linked to increased dN/dS due to relaxed selection possibly caused by a reduction in the effective population size (Chak et al. 2021; Weyna and Romiguier 2021; Roux et al. 2024). Relaxed selection has been associated with eusociality in Hymenoptera (Hunt et al. 2011; Weyna and Romiguier 2021) and termites (Ewart et al. 2024; Roux et al. 2024), indicating a possible universal link between eusociality and relaxed selection. Subsocial sister groups, such as the subsocial woodroaches Cryptocercus, were, however, mostly lacking in these studies, so that it is currently unknown whether selection was already relaxed in subsocial ancestors. However, for a single species, Cryptocercus wrighti, dN/dS values based on transcriptomes appeared intermediate between solitary cockroaches and eusocial termites (Roux et al. 2024).
The genomic signatures of sociality within the order Blattodea have so far been studied by comparing solitary or gregarious cockroaches with the social termites (Terrapon et al. 2014; Harrison et al. 2018; Mikhailova et al. 2023). The termite genomes revealed changes in transcriptional regulation, chemical communication and patterns of selection as well as reductions in genome and proteome sizes (Mikhailova et al. 2023). Not including Cryptocercus means that these studies cannot determine whether genomic changes occurred at or before the emergence of termites, limiting the ability to investigate the precise relationship between genomic traits and the transition to advanced sociality in termites. To understand the genomic changes that predated the emergence of termites, we analyze two high-quality genomes of subsocial woodroaches along with several termite and cockroach genomes of varying social complexity. We present a novel genome of the North American species, C. punctulatus, and compare this to the genome of the recently sequenced Asian species C. meridianus (Liu et al. 2025). The North American and Asian Cryptocercus clades diverged an estimated 85 million years ago (Che et al. 2022), allowing us to distinguish lineage-specific adaptations from ancestral genomic features that may have contributed to the evolution of eusociality in termites.
Results
Cryptocercus Genomes and Proteomes More Similar to Termites than Cockroaches
We de novo assembled a high-quality genome of the wood-roach Cryptocercus punctulatus using a combination of long ONT and short TELL-seq reads together with Hi-C data. Here we analyze this genome along with a second Cryptocercus genome, C. meridianus, presented in a parallel study (Liu et al. 2025), which was sequenced and assembled using similar methods. High contiguity ( and 70 Mb, and 12), as well as high completeness (Complete BUSCO = 99.5% and 99.7%), were achieved for both genomes, respectively (Table 1).
At 1.2 Gb and 820 Mb, these Cryptocercus genome assemblies are more similar in size to those of six published termite genomes (mean 0.87 Gb) than to those of 11 other published cockroach genomes (mean 2.35 Gb; Table 1; Fig. 1c). For comparative analyses, we concentrated on genomes with published, reliable annotations of protein coding genes, marked with an asterisk in Table 1. Similar to genome sizes, the proteomes of Cryptocercus (18,688 protein-coding genes in C. punctulatus and 15,909 in C. meridianus) and termites (from 12,984 in Coptotermes formosanus to 18,160 in Cryptotermes secundus) are substantially smaller than those of the other three analyzed cockroach genomes. (Blattella germanica: 25,393, Diploptera punctata: 27,939, Periplaneta americana: 27,047; Fig. 1d). Taken together, these findings indicate that a reduction in genome and proteome size occurred in a common ancestor of termites and Cryptocercus, along with or predating the evolution of subsociality and xylophagy.
a) Phylogenetic tree of species used in this study. The tree was created with iqtree2 based on concatenated protein alignments and rooted with Eriosoma lanigerum and Frankliniella occidentalis. Cockroaches, subsocial Cryptocercus, and termites are annotated. b) Mean dN/dS is shown for each species and labeled within the bars. Error bars are standard error of the mean. Medians are shown as black dots. c) Genome size in Gb for each species. d) Gene numbers per species grouped by orthology. e) Boxplots of sequence identity between Cryptocercus and other species; the sequence identity was calculated as a frequency of identical sites in multiple sequence alignments for each single-copy orthogroup.
In our species set of six termites, five cockroaches and two outgroups, we identified a total of 19,999 gene ortholog groups, including 712 single-copy orthologs (SCO) and 1067 Cryptocercus-specific ortholog groups. A further 1109 and 1710 genes were species-specific, singletons in C. meridianus and C. punctulatus, respectively. Within the 712 SCOs, we found higher sequence similarity between Cryptocercus and termites (0.862 to 0.872 median sequence identity to C. punctulatus or C. meridianus) than with other cockroach species (0.760 to 0.789; Fig. 1e). As expected, sequence identity was high between C. meridianus and C. punctulatus (0.967).
Species-specific Characteristics of C. Meridianus and C. Punctulatus
Although both Cryptocercus species show low divergence within single-copy orthologs, we also showed in the previous section that their genomes differ both in length (Table 1) and coding content (Fig. 1d). The genome of the North American C. punctulatus contains around 370 million more nucleotides and almost 3000 more protein-coding genes than its Asian counterpart, C. meridianus. Accordingly, we found a greater number of species-specific genes in the C. punctulatus genome, especially in multi-copy (1693 vs. 336), but also as single-copy, species-specific genes (1710 vs. 1109). These species-specific genes were enriched for DNA integration, DNA replication initiation, DNA-templated transcription, protein phosphorylation, and double-strand break repair in C. punctulatus, while we found no significant functional enrichment within the species-specific genes in C. meridianus.
Relaxed Selection with the Evolution of Subsociality
Previous studies have found a relaxation in selection strength in eusocial species due to an expected decrease in effective population size ( ) compared to noneusocial species (Ewart et al. 2024; Roux et al. 2024). To test whether such a relaxation in selection could be detected as early as the onset of subsociality in Cryptocercus, we analyzed dN/dS among all SCOs. We found generally higher values in Cryptocercus (mean: 0.140 & 0.164; median: 0.089 & 0.112) and termites (mean: 0.113–0.230; median: 0.063–0.105, except on the very short R. speratus branch: 0.0001) than in other nonsubsocial, nonwood-feeding cockroaches (mean: 0.048–0.066; median: 0.035–0.046), indicating that relaxation in selection may have already occurred with the onset of subsociality (Fig. 1a). If increasing social complexity does indeed cause a relaxation of selection via a decrease in , then we would expect a stepwise increase in dN/dS in the three groups: (1) nonsubsocial cockroaches; (2) subsocial Cryptocercus; (3) eusocial termites. To test this, we estimated dN/dS for different branch models with CodeML (Yang 2007), in which we allowed (a) a single dN/dS for all Blattodea (H0), (b) two dN/dS values, one each in all cockroaches and in termites (H2a), (c) or one each in nonsocial cockroaches and in social Blattodea (Cryptocercus plus termites; H2b), as well as (d) three values, testing for differences in dN/dS among nonsubsocial cockroaches, Cryptocercus and termites (H3). Model H2b was favored over H0 most often (618 out of 712 SCOs at FDR < 0.05), followed by H3 (614) and H2a (581), indicating an increase in dN/dS occurred throughout Cryptocercus and termite evolution. Interestingly, when using the H3 model, dN/dS was significantly higher on Cryptocercus branches than on termite branches ( , ), however, this model was only preferred over the H2b model for 144 out of the 712 orthologs. These findings support a potential relaxation of selection throughout termites and Cryptocercus but the data do not, with the current data set, support a link between relaxed selection and increasing social complexity.
In further support of increased dN/dS levels in Cryptocercus using RELAX from the HyPhy suite, we identified 18 of the 712 SCOs to have undergone a significant relaxation of selection (adjusted P-value < 0.05) on Cryptocercus branches (asterisks in Fig. 1a). These genes with significant signatures of relaxed selection contained 5 genes with mitochondrial functions, including two mitochondrial ribosomal proteins and an NADH dehydrogenase subunit of the electron transport chain (Table S1). Accordingly, these genes were enriched for the GO-term “mitochondrial translation”, as well as phosphatidyl phosphate biosynthesis, tRNA processing, primary metabolic process, and organic substance metabolic process (Fig. 2). We found only 5 genes undergoing significant intensified selection on Cryptocercus branches (adjusted P-value < 0.05) that interestingly also included two genes with mitochondrial functions, as well as two genes involved in RNA regulation and a component of the proteasome (Table S1). We recognise that the use of the species tree rather than gene trees may have limitations due to discordance of some nodes. However, our node of interest was concordant in over 90% of gene trees and in all of the significant genes presented here, except one of the genes under intensified selection (OG0004945, Table S1), therefore minimizing the risk of false positives.
GO-terms significantly enriched in orthologs under relaxed, intensified or positive selection on Cryptocercus branches.
Adaptive Evolution in Cryptocercus
Besides signals of relaxed selection, we also found evidence for positive selection in Cryptocercus. With aBSREL from the HyPhy suite, we identified SCOs that underwent positive selection in at least one of the three tested Cryptocercus branches (marked with “*” in Fig. 1a). P-values were corrected for multiple testing among branches for each SCO (Holm-Bonferroni method) and then for each tested branch among SCOs using the FDR method. In this manner we identified 12 SCOs with an FDR < 0.1 on the Cryptocercus branches. These SCOs include several genes involved in transcriptional and translation regulation, such as mitochondrial ribosome components, mRNA splicing factor, a component of DNA polymerase, and a gene involved in small RNA biogenesis (Table S2). The SCOs were enriched for five GO-term functions: glutathione biosynthetic process, DNA-templated DNA replication, peseudouridine synthesis, mRNA processing, and translation (Fig. 2).
Gene Family Evolution
We investigated the evolution of gene family size with CAFE5 (Mendes et al. 2020). In line with the observed reduction in proteome size, we inferred a substantially higher number of total gene family contractions than expansions at the ancestral node of Cryptocercus and termites (316 and 75, respectively), including 54 significant contractions and 33 significant expansions (Fig. 3). At the origin of Cryptocercus, we also inferred a higher number of contractions than expansions (546 & 230). However, a larger number of gene families were significantly expanded (63) rather than contracted (40) (Fig. 3). On each Cryptocercus terminal branch we found similar numbers of expansions and contractions, although expansions were slightly higher in C. punctulatus (expansions: 417, 199 significant; contractions: 308, 133 significant), while contractions were higher in C. meridianus (expansions: 272, 112 significant; contractions: 399, 175 significant; Fig. 3).
Gene family evolution. Shown are the numbers of significant gene family expansions and contractions throughout the analyzed phylogeny. The bar plots show the GO-terms that are significantly enriched among expansions and contractions at the root of Cryptocercus (left) and on all 3 Cryptocercus branches (right).
Among those genes significantly expanded at the origin of Cryptocercus, we found a significant enrichment of 11 GO-terms within the Biological Processes category, including functions related to telomere maintenance, DNA-repair and cell-cycle, as well as tricarboxylic acid cycle, autophagy of mitochondrion and 2-oxoglutarate metabolic process (Fig. 3). When considering all gene families expanded on any of the three Cryptocercus branches, we also found significantly enriched GO-terms related to mRNA processing, lipid transport, negative regulation of phosphorylation and intrinsic apoptotic signaling (Fig. 3).
The genes significantly contracted at the origin of Cryptocercus were enriched for metabolic process and nucleoside phosphate biosynthetic process (Fig. 3). Among gene families significantly contracted on any of the Cryptocercus branches, we also found enrichments of GO-terms related to biological regulation, protein processing and lipoprotein metabolism (Fig. 3).
Ionotropic Receptors are Reduced in Cryptocercus and Termites
Previous studies suggest an expansion of Ionotropic Receptors (IRs) within Blattodea, with hundreds of copies found in the genomes of the German cockroach (Harrison et al. 2018), the American cockroach (Li et al. 2018) and the Pacific beetle cockroach (Fouks et al. 2023). However, these chemoreceptors were shown to be reduced to around 100 in three termite species (Harrison et al. 2018). Similar to previous findings, we predicted over 400 IR copies in each of the three analyzed cockroach genomes (D. punctata: 434; B. germanica: 978; P. americana: 442) and fewer than half as many in five of the analyzed termite species (106 to 213) (R. lucifugus was not analyzed due to missing genomic information). In the Cryptocercus genomes we predicted 199 in C. meridianus and 201 gene copies in C. punctulatus, suggesting that the reduction in the IR repertoire predated the evolution of Cryptocercus and termites. In each of the two outgroups (two Acercaria quite distantly related to Blattodea), we predicted only 28 copies.
Predicted DNA Methylation
We predicted methylation levels of each gene for all of the analyzed genomes by comparing CpG counts to expectations based on proportions of cytosines and guanines in coding regions. A depletion of CpGs is an indication of increased DNA methylation (Bewick et al. 2017). We found a unimodal distribution of values (observed verus expected CpG counts) in the genes of the three nonsubsocial cockroaches but a bimodal distribution in Cryptocercus spp. and in all analyzed termites, except C. meridianus, which may have 3 modes (Fig. 4). We also analyzed genome-wide CpG depletion on all available genomes (Table 1). For this we calculated in all 1kb windows. In contrast to results for genes, we observed similar, unimodal genomic distributions of CpG depletion among Cryptocercus and other cockroaches, while termites revealed distinct, bimodal distributions (Fig. S2). The termite, Macrotermes natalensis (unimodal), and the cockroach, Geoscapheus dilatatus (bimodal), were the exceptions to these patterns. These results indicate that , in contrast to the nonsubsocial cockroaches, a subsection of genes are more strongly methylated than other genes in Cryptocercus and termites. In noncoding regions, on the other hand, Cryptocercus genomes appear more “cockroach-like”.
Distributions of gene CpGo/e values for the analyzed species. Modes are denoted as vertical lines, estimated with the R package: LaplacesDemon(Statisticat 2021). Top row: outgroups; second row (red): nonsubsocial cockroaches; third row (yellow): subsocial cockroaches; fourth and fifth row (blue): termites.
Discussion
To investigate genomic changes during a social transition in Blattodea, we analyzed two high-quality genomes of subsocial wood-feeding Cryptocercus cockroaches which represent the nearest extant sister lineage of termites. With these novel resources, we have been able to test whether previously reported genomic differences between termites and cockroaches are uniquely associated with the evolution of advanced sociality in termites, or whether they coincide with the emergence of wood-feeding and/or subsociality in the common ancestor of termites and Crytocercus. Confirming previous C-value estimations of genome size (Koshikawa et al. 2008), we found that a reduction of genome size likely occurred in the common ancestor of Cryptocercus and termites, either coinciding with or predating transitions to subsociality and wood-feeding. We also found smaller proteomes both within Cryptocercus and termites compared to nonsubsocial cockroaches. This was exemplified by the number of ionotropic receptors (IRs) we detected in the analyzed Blattodean genomes. IRs had previously been inferred to be massively expanded in cockroaches and then secondarily reduced in termites (Harrison et al. 2018; Li et al. 2018; Fouks et al. 2023). Our analyses support a reduction in IRs in both extant Cryptocercus and termite species compared to the three rather generalist nonsubsocial cockroaches included in this study, indicating either that a reduced number of chemoreceptors is linked with more integrated forms of sociality and/or wood-feeding, potentially due to the highly specific and chemically uniform nature of the wood niche. Genome assemblies of other wood-feeding and omnivorous cockroaches and of other more distantly related subsocial cockroaches are necessary to investigate this question further.
Although we concentrate in this study on the evolutionary genomics of the Cryptocerus genus as a sister group to all termites, we also present genomic differences between the two analyzed species, including a larger genome and proteome in C. punctulatus. These differences were mainly related to functions in DNA replication, transcription, and protein phosphorylation. These findings are likely linked to species-specific adaptations in these two geographically distant species. Genomes of further North American and Asian Cryptocercus species are necessary to assist in distinguishing truly species-specific genome characteristics from regional differences.
We show that the previously predicted change in methylation patterns in termites compared to cockroaches (Harrison et al. 2018) may have evolved much earlier than previously thought: in the common ancestor of Cryptocercus and termites, and therefore coinciding with the presumed transition to subsociality. These results support the hypothesis that fundamental gene regulatory changes were necessary for the initial evolutionary transition to subsociality in Cryptocercus (Mikhailova et al. 2023), implying that this was a necessary genomic prerequisite for the evolution of advanced sociality in termites, given that such methylation patterns are conserved across termites. In further support, we present evidence for the adaptive evolution of several genes involved in translation and transcriptional regulation within the Cryptocercus lineage. Whether our proposed methylation patterns or indeed other mechanisms of transcriptional regulation changed to such a dramatic degree in the common ancestor of termites and Cryptocercus needs to be further corroborated with more direct measurements, such as bisulphite sequencing and transcriptomic sequencing across a broader range of blattodean species. Such data will help to illuminate the role of transcriptional regulation in the evolution of subsociality and its potential importance for the emergence of phenotypic plasticity in termites.
Additionally, we present evidence for a relaxation of selection in both Cryptocercus and termite genomes, indicating that a reduction in effective population size may already have occurred in the presumed subsocial, wood-feeding ancestor of this clade. The reasons for this apparent relaxation of selection in Cryptocercus and in termites require further investigation. Our results confirm previous findings of inflated dN/dS values in termites compared to cockroaches (Ewart et al. 2024; Roux et al. 2024) but importantly cast doubt on the expectation of a positive correlation between dN/dS and social complexity due to a hypothesized reduction in the effective population size in more advanced social groups (Weyna and Romiguier 2021; Mikhailova et al. 2023; Ewart et al. 2024; Roux et al. 2024). In contrast to a previous study based on transcriptomes, in which dN/dS values for C. wrighti were intermediate between termites and nonsocial cockroaches (Roux et al. 2024), in the current analysis using whole genomes, we find at least equally high and possibly higher dN/dS values in subsocial Cryptocercus compared to eusocial termites. Our findings suggest that relaxed selection was linked to the transition to wood-feeding and parental care in the common ancestor of termites and Cryptocercus rather than to the emergence of advanced sociality in termites. However, using current data, we are unable to separate the effects of phylogeny, wood-feeding and sociality on dN/dS levels. The integration of further higher quality genomes from species with varying levels of social complexity throughout Blattodea will allow this question to be more rigourously interrogated.
These two Cryptocercus genomes provide important genomic resources both for understanding social evolution and the transition to wood-feeding (Legendre and Grandcolas 2018). We present evidence for genomic adaptations to feeding on wood; a recalcitrant and energy-poor resource. Here, we show that some genomic traits previously associated with termites, such as elevated dN/dS (Ewart et al. 2024) and global patterns of methylation (Harrison et al. 2018) likely evolved much earlier, at the latest in the most recent common ancestor of Cryptocercus and termites. This suggests that the selective pressures linked to the emergence of subsociality, xylophagy as well as the acquisition of novel obligate gut symbionts (Biedermann and Rohlfs 2017), and that understanding the evolutionary origins of termite sociality require a more detailed and focused analysis of gene family as well as gene regulatory evolution. Equally, we encourage increased focus on diversity across Blattodea. The generation of further high-quality genomes from species varying in degrees of (sub)sociality and diet, from species representing independent origins of xylophagy and subsociality, is essential for investigating the genomic mechanisms involved in the evolution of two major evolutionary novelties in this clade: xylophagy and eusociality.
Methods
Genome Sequencing
Extraction of Genomic DNA
Cryptocercus punctulatus individuals were collected in Virginia (USA) and reared in the ISYEB laboratory. High molecular weight genomic DNA (HMW gDNA) was extracted from snap-frozen single legs of C. punctulatus with the Circulomics Insect Big DNA kit (protocol version v0.20a). In brief, one leg was homogenized on ice for 10s with the TissueRuptor (Qiagen), and the homogenized tissue was lysed in buffer CT (Circulomics) with Proteinase K. RNA was removed by RNAse A treatment. HMW gDNA was captured with the Circulomics Nanodisks and eluted in elution buffer. This protocol resulted in high-quality HMW gDNA of about 170 Kb in size when analyzed with the Agilent Femtopulse capillary electrophoresis system.
Oxford Nanopore (ONT) Sequencing of HMW gDNA
One ONT library of HMW gDNA of C. punctulatus was prepared as recommended by Oxford Nanopore according to the Nanopore protocol “Genomic DNA by ligation” (SQK-LSK110, Version: GDE_9108_v110_revH_10Nov 2020). In brief, approximately 4 μg of HMW gDNA was used for the ONT library preparation, after applying a DNA repair and end repair step, ONT sequencing adapters were ligated to the HMW gDNA. The final ONT library was loaded twice on a PromethION FLO-PRO002 flow cell with the reload after 24 hours and ran in total for 3 days on the PromethION instrument of the Vienna Biocenter Core Facilities. Sequencing yield was 33 Gbp, average length of the library about 7.5 kb.
TELL-Seq Whole Genome Sequencing
One indexed paired-end Transposase Enzyme Linked Long-read Sequencing (TELL-Seq) whole genome library was prepared from 10 ng of HMW gDNA extracted from one snap-frozen single leg of C. punctulatus following the TELL-Seq WGS Library Prep User Guide (Document # 100005 v6). In brief, fragments of HMW gDNA were barcoded with unique molecular identifiers (= barcodes) making use of barcoded microbeads by a transpose. This barcoding step is followed by a second transposase step to fragment the barcoded gDNA and to add an indexing barcode. After amplification and further clean-up, the resulting barcoded linked-read TELL-Seq library ran on the NovaSeq 6000 of the DRESDEN concept genome center on an S4 flow cell with 2x 150 cycles.
Hi-C Confirmation Capture
Chromatin confirmation capturing was done making use of the ARIMA-HiC 2.0 kit and followed the user guide for animal tissues (ARIMA Document, Part Number: A160162 v00). In brief, 1 flash-frozen powdered leg of C. punctulatus was cross-linked chemically. 760 ng cross-linked genomic DNA was digested with the ARIMA-HiC 2.0 restriction enzyme cocktail consisting of four restriction enzymes. The 5’-overhangs were filled in and labeled with biotin. Spatially proximal digested DNA ends were ligated and finally the ligated biotin containing fragments were enriched and went for Illumina library preparation, which followed the ARIMA user guide for Library preparation using the KapaR Hyper Prep kit (ARIMA Document Part Number A160139 v00). The barcoded Hi-C library ran on an S4 flow cell of a NovaSeq6000 with 2x 150 cycles.
Assembly
The assembly of C. punctulatus genome was performed as described in a previous study (Liu et al. 2025). In short, the long ONT reads were assembled using Flye version 2.9 with default settings (Kolmogorov et al. 2019). The quality of the assembly was evaluated using QUAST version 5.0.2 (Mikheenko et al. 2018) and BUSCO version 5.1.2 (Manni et al. 2021) after each assembly step. The assembly was polished with short TELL-seq reads using HyPo version 1.0.3 (Kundu et al. 2019) with default settings. To resolve regions with high heterozygosity, we used purge haplotigs (Roach et al. 2018). Initially, a coverage histogram was created using the command “purge_haplotigs hist”. Subsequently, coverage was analyzed for each contig individually with the command “purge_haplotigs cov -l 5 -m 70 -h 250” with the argument values chosen based on the coverage histogram. The scaffolding was done using Hi-C data and the following programs: mapped Hi-C reads with BWA-MEM (Li 2013), processed with pairtools (Open2C et al. 2024) with the command “pairtools sort –nproc 16”. PCR and optical duplicates were removed with the command “pairtools dedup –nproc-in 8 –nproc-out 8 –mark-dups”. The mapped reads were used to generate scaffolded genome assembly with YaHS version 1.2a.2 with default settings (Zhou et al. 2023).
Annotation
The C. punctulatus genome was annotated using the same pipeline as for C. meridianus (Liu et al. 2025). First, repeat elements were detected by RepeatModeler v.2.0.3 (Smit et al. 2015b) and masked by RepeatMasker v.4.1.2 (Smit et al. 2015a). The masked genome was used for identifying protein-coding genes, which integrated protein-genome alignments, transcripts-genome alignments and ab initio gene predictions. For protein-genome alignments, we collected protein sequences of (1) 45 termite species, C. meridianus and Blatta orientalis Liu et al. (2025), (2) 10 model insects (Aedes aegypti, Anopheles gambiae, Drosophila melanogaster, Acyrthosiphon pisum, Apis mellifera, Nasonia vitripennis, Solenopsis invicta, Bombyx mori, Manduca sexta, Tribolium castaneum) from NCBI, and (3) four Blattodeans (Blattella germanica, Cryptotermes secundus, Zootermopsis nevadensis, Coptotermes formosanus) from InsectBase (Mei et al. 2022), and mapped them to the genome using miniprot v.0.7 (Li 2023). For transcript-genome alignments, RNA reads were mapped to the masked genome using Hisat2 v.2.2.1 (Kim et al. 2019) and assembled using StringTie v.2.1.4 (Pertea et al. 2015). As for ab initio predictions, we incorporated (1) AUGUSTUS v.3.4.0 (Stanke and Waack 2003) trained with gene structures identified from transcript-genome alignments by TransDecoder v.5.6.0 (Haas 2013), (2) BRAKER v.2.1.6 (Hoff et al. 2019) trained with RNA read-genome mapping from Hisat2, (3) GALBA v.1.0.1 (Brůna et al. 2023) trained with protein sequences of the four Blattodeans from InsectBase. Then protein-genome alignments, transcripts-genome alignments, gene structures from TransDecoder and ab initio gene predictions were integrated into consensus gene structures by EvidenceModeler v.1.1.1 (Haas et al. 2008). Genes structures were removed if they were inferred by only one ab initio predictor and lack additional evidence from other sources. These gene structures and transcripts assembled by StringTie were processed with two iterations of PASA v.2.5.2 (Haas et al. 2008). Genes were excluded if they have incomplete reading frame or in-frame stop codon. Finally, we extracted the longest protein isoform of each gene, searched them against the nonredundant (nr) database using DIAMOND v2.1.7.161 (Buchfink et al. 2015), and processed the homology hits using MEGAN6 (Huson et al. 2007), which assigned peptides to taxa. Scaffolds were excluded as contaminations if above 90% of genes on them were assigned to Bacteria, Archaea or viruses.
Preparation of Proteomes for Analyses
For further analyses, we added the existing genomic data of 3 cockroaches and 6 termites as well as two outgroup species (Eriosoma lanigerum and Frankliniella occidentalis) (Poulsen et al. 2014; Terrapon et al. 2014; Harrison et al. 2018; Li et al. 2018; Itakura et al. 2020; Rotenberg et al. 2020; Biello et al. 2021; Shigenobu et al. 2022; Fouks et al. 2023; Martelossi et al. 2023). We extracted the proteomes with gffread (Pertea and Pertea 2020), then kept only the longest isoforms and filtered out pseudogenes using DW-Helper scripts (https://domain-world.zivgitlabpages.uni-muenster.de/dw-helper/).
Orthology Detection
To investigate selection and gene family dynamics we identified orthologs using orthofinder2 at default settings (Emms and Kelly 2015). Single copy orthologs (SCOs) were used for the subsequent selection analyses, while inferred gene copy numbers within orthogroups were used as input for inferring gene family evolution.
Selection Analysis
Protein sequences of all SCOs were aligned using PRANK (Löytynoja 2014) and GUIDANCE at default settings (Penn et al. 2010). The corresponding coding sequences were aligned based on these protein alignments, using pal2nal (Suyama et al. 2006) and trimmed using Gblocks with the settings -t=c -p=y -b3=1000000 -b4=2 -b5=h (Talavera and Castresana 2007). A species tree was inferred based on a partitioned, concatenation of the SCO protein alignments, using iqtree, v2.1.3, in the extended model selection mode, merging similar partitions (-m MFP+MERGE) (Minh et al. 2020). This species tree and the coding sequence alignments were used as input for selection analyses. The species tree was used in order to infer evolutionary mechanisms on specific branches of interest, while recognizing that some gene trees may be discordant. Furthermore, the species tree is more likely to have well supported, fully resolved nodes, important for these analyses. Discordance of the node of interest was investigated by comparing all gene trees with the species tree using iqtree2 (–gcf). Overall dN/dS ratios were calculated using the free-ratios model in codeml ( , ) (Yang 2007). Branch-specific relaxed and intensified selection was inferred on three foreground branches using RELAX (Wertheim et al. 2015). P-values were corrected among orthologs for mulitple testing with the FDR method. Positive selection was investigated with aBSREL (Smith et al. 2015), with P-values corrected among branches with the Holm-Bonferroni method. We subsequently corrected these adjusted P-values for each tested branch among orthologs, using the FDR method.
Gene Family Evolution
Gene family dynamics were investigated using CAFE5 with a single K and lambda value having the highest maximum likelihood (Mendes et al. 2020). Gene counts were taken from the orthofinder output. An ultrametric species tree was created from the iqtree-generated species tree using the orthofinder script make_ultrametric.py and an estimated time of 383 MY for the tree root based on median divergence between the outgroup Eriosoma lanigerum and the termite M. natalensis (source: timetree.org). To verify the accuracy of this ultrametric tree, a further time-calibrated tree was created with chronos() from the R library ape, using an additional calibration at the ancestral node of Cryptocercus and termites (130.0–185.1 MYA, source: timetree.org). The thus constructed tree was topologically identical to our original ultrametric tree based on a Robinson-Foulds distance of zero, increasing confidence in its accuracy.
Functional Enrichment
Protein domain scans were carried out using Pfamscan (Finn et al. 2014) and interproscan (Mitchell et al. 2015) with GO terms extracted using Pfam2GO. Individual GO universes were created for both selection and CAFE analyses, while enrichment was conducted with TopGO, using the weight algorithm (Alexa and Rahnenführer 2009).
Gene Divergence
Sequence identity of Cryptocercus genes compared to other analyzed blattodean genomes was calculated based on the alignments of single-copy orthologs from the previous step. We used “seqidentity” function from the R package bio3d to calculate the percentage of identical sites within each orthologous group (Grant et al. 2006). To calculate relative differences in sequences, we divided loss of identity (1 - sequence identity) by divergence times between pairs of species. Divergence times were taken from timetree.org (accessed September 2024) based on (Bourguignon et al. 2018) for most species and (Forni et al. 2021) for dating the Blaberoidea species B. germanica and D. punctata.
Predicted Methylation Profiles
Relative levels of CpG depletion were calculated per gene using a custom Python script. In brief, for each gene, the number of observed CGs in the coding region was divided by the expected number of CGs ( ). is calculated from the the product of the individual proportions of Cs and Gs in the sequence.
Annotation of Ionotropic Receptors
Ionotropic receptors were annotated in the analyzed genomes with Bitacora, version 1.4 (Vizueta et al. 2020). Protein sequences of known IRs were taken from (Fouks et al. 2023) and aligned with mafft, v. 7.505 (Katoh et al. 2002), with the settings: –genafpair –maxiterate 16. A hidden markov model was created from the alignment with hmmbuild from the hmmer suite, v. 3.3 (Mistry et al. 2013). Bitacora was run on this hmm and all annotated protein sequences for each species, using an e-value threshold of 1e-05, including GeMoMa, v. 1.7.1 (Keilwagen et al. 2019), a maximum intron length of 15 000, and a filter length of 30 amino acids.
Supplementary Material
evag028_Supplementary_Data
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Alexa A, Rahnenführer J. Gene set enrichment analysis with top GO. Bioconductor Improv. 2009:27:1–26.
- 2Bewick AJ, Vogel KJ, Moore AJ, Schmitz RJ. Evolution of DNA methylation across insects. Mol Biol Evol. 2017:34:654–665. 10.1093/molbev/msw 264.28025279 PMC 5400375 · doi ↗ · pubmed ↗
- 3Biedermann PH, Rohlfs M. Evolutionary feedbacks between insect sociality and microbial management. Curr Opin Insect Sci. 2017:22:92–100. 10.1016/j.cois.2017.06.003.28805645 · doi ↗ · pubmed ↗
- 4Biello R et al A chromosome-level genome assembly of the woolly apple aphid, Eriosoma lanigerum Hausmann (Hemiptera: Aphididae). Mol Ecol Resour. 2021:21:316–326. 10.1111/men.v 21.1.32985768 · doi ↗ · pubmed ↗
- 5Bourguignon T et al Transoceanic dispersal and plate tectonics shaped global cockroach distributions: evidence from mitochondrial phylogenomics. Mol Biol Evol. 2018:35:970–983. 10.1093/molbev/msy 013.29420807 · doi ↗ · pubmed ↗
- 6Brůna T et al Galba: genome annotation with miniprot and AUGUSTUS. BMC Bioinformatics. 2023:24:327. 10.1186/s 12859-023-05449-z.37653395 PMC 10472564 · doi ↗ · pubmed ↗
- 7Brune A, Dietrich C. The gut microbiota of termites: digesting the diversity in the light of ecology and evolution. Annu Rev Microbiol. 2015:69:145–166. 10.1146/micro.2015.69.issue-1.26195303 · doi ↗ · pubmed ↗
- 8Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2015:12:59–60. 10.1038/nmeth.3176.25402007 · doi ↗ · pubmed ↗
