Whole-Genome Identification and Investigation of DNA Methylation Sites in Nosema ceranae
Jianfeng Qiu, He Zang, Kaiyao Zhang, Nian Fan, Yunzhen Yang, Haimei Yue, Dafu Chen, Rui Guo

TL;DR
This study identifies DNA methylation patterns in Nosema ceranae, a honeybee parasite, to better understand its role in infection and behavior.
Contribution
The first whole-genome DNA methylation analysis in Nosema ceranae using Oxford Nanopore Technology.
Findings
Identified 140,711 CpG, 170,035 CHG, and 1,053,635 CHH methylation sites in the N. ceranae genome.
Observed methylation in repetitive regions and gene regions, suggesting regulatory roles.
Detected three 5mC motifs, providing insights into potential epigenetic mechanisms.
Abstract
Nosema ceranae is a fungal parasite that infects honeybees and contributes to colony collapse. However, the role of DNA methylation, an important chemical modification that regulates gene expression, has not been well understood in this organism. In this study, we used advanced sequencing technology to identify DNA methylation patterns across the entire genome of N. ceranae. We discovered a significant number of methylation sites in the genome, including different regions associated with genes and repetitive DNA sequences. These findings lay the groundwork for understanding how DNA methylation may affect the behavior and infection process of this parasite. This research is crucial for developing better strategies to control the impact of N. ceranae on honeybee populations, which are vital for pollination and global food production. DNA methylation is a key epigenetic modification…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2- —National Natural Science Foundation of China
- —Earmarked fund for China Agriculture Research System
- —Natural Science Foundation of Fujian Province
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInsect symbiosis and bacterial influences · Epigenetics and DNA Methylation · Genomics and Phylogenetic Studies
1. Introduction
DNA methylation is a key epigenetic regulatory mechanism present in plants, animals, and microorganisms [1], including Homo sapiens [2], Mus musculus [3], Oryza sativa [4], and Phytophthora infestans [5]. DNA methylation is broadly classified into several forms, including N^4^-methylcytosine (4mC), 5-methylcytosine (5mC), and N^6^-Methyladenine (6mA) [6]. In animals and plants, DNA methylation plays essential roles in various processes, such as cellular differentiation, genomic imprinting [7], and embryogenesis [8,9].
Although research on DNA methylation in fungi is still in its early stages, emerging evidence highlights its critical functions in the regulation of gene expression, development, and reproduction. For example, in some fungi, differential DNA methylation has been shown to play a role in regulating hyphal growth and conidiation [10]. Nosema ceranae, a single-cell fungal parasite, contributes to honeybee colony collapse [11]. Recent research on N. ceranae has encompassed transcriptomics [12], gene function [13], non-coding RNA identification [14,15], and host impact [16]. However, DNA methylation, a critical epigenetic regulation mode in fungi, has not yet been studied in N. ceranae.
In this study, we employed third-generation sequencing, Oxford Nanopore Technology (ONT), to conduct whole-genome DNA methylation sequencing on N. ceranae, identifying 5mC sites across the genome. Our findings enhance the understanding of fungal DNA methylation and provide a foundation for exploring the epigenetic regulatory roles and mechanisms of DNA methylation in N. ceranae proliferation and infection.
2. Materials and Methods
2.1. Fungi
The N. ceranae spores used in this study were previously purified and stored at the Honeybee Protection Laboratory, College of Bee Science and Biomedicine, Fujian Agriculture and Forestry University, Fuzhou, China.
2.2. Genomic DNA Extraction and Nanopore Sequencing
Following the procedure outlined by Tourancheau et al. [17], total DNA was extracted from spore samples of the microsporidian infecting the eastern honeybee, utilizing the QIAamp DNA Microbiome Kit (QIAGEN, Hilden, Germany). The extracted DNA was purified using the DNA Clean & Concentrator-5 elution buffer (Zymo Research, Irvine, CA, USA) and ultimately eluted in a solution containing 10 mM Tris-HCl (pH 8.5) and 0.1 mM EDTA. Subsequently, the purified DNA underwent treatment with RNase A. The purity, concentration, and integrity of the DNA were evaluated using a Nanodrop spectrophotometer (Therm Fisher Scientific, Waltham, MA, USA), Qubit 3.0 fluorometer (Therm Fisher Scientific, Waltham, MA, USA), and 0.35% agarose gel electrophoresis. A Nanodrop 2000 spectrophotometer (Therm Fisher Scientific, Waltham, MA, USA) was employed to assess the quality of the input DNA, while a Qubit 3.0 fluorometer was utilized to ascertain DNA concentration. The initial quantity of DNA was set at 1 μg, and the library was assembled using a ligation kit (Oxford Nanopore Technologies, Kidlington, UK), with adapter ligation executed at 24 °C for 30 min. Finally, the samples underwent sequencing on the R9.4.1 MinION platform, employing MinKNOW (version 1.5.12).
2.3. Data Quality Control
For quality control of post-sequencing data, processing was conducted in accordance with Zhang et al. [18]: (1) Initially, the multi_to_single_fast5 tool from https://github.com/nanoporetech/ont_fast5_api (accessed on 15 February 2023) was utilized to convert fast5 files into individual files, with each read corresponding to a separate fast5 file. (2) Guppy 5.0.16 software was employed for base calling with the dna_r9.4.1_450bps_hac_modbases_5mC_6mA.cfg model, converting fast5 format data to fastq format. (3) The fastq data was further filtered to remove adapters, short fragments (length < 500 bp), and low-quality reads, yielding high-quality clean reads for subsequent analysis. Subsequently, clean reads were aligned to the reference genome of the microsporidian (assembly ASM98816v1, Nosema ceranae BRL01) employing the Split-Reads program of Minimap2 software (2.25). Through analysis of clean read locations on the reference genome, sequencing depth and alignment efficiency were assessed.
2.4. DNA Methylation Site Detection
To ascertain DNA methylation status, Nanopolish software (0.13.2) [19] was employed to detect CpG methylation through a hidden Markov model. Specifically, the nucleotide string on the reference genome to be analyzed for methylation status is designated as SR (this sequence must contain at least one CpG site and its five adjacent bases). The methylation status of the SR region was determined by aligning base-called reads to the reference genome and assessing the probability of coverage within the SR region. To define a CpG site as methylated, we applied two strict thresholds: Minimum read coverage ≥ 10× and Log-likelihood ratio (LLR) cutoff ≥ 2.0. Only CpG sites meeting both criteria were retained for analysis. Tombo software (v1.5.1) [20] was utilized to identify CHH (H = A/T/C) and CHG sites. The Tombo software (v1.5.1) first performs re-squiggle on the sequencing data and then uses the Alternative Model to detect CHH and CHG sites. Minimum read coverage ≥ 10×; LLR cutoff ≥ 3.0 (a higher LLR cutoff was used to minimize false positives for low-frequency non-CpG methylation events). Additionally, we filtered out CHG/CHH sites with a methylation frequency below 10% to exclude stochastic methylation signals.
2.5. Analysis of DNA Methylation Levels
Chromosomes were segmented into 100 Kb windows to calculate average methylation levels, displayed genome-wide. RepeatMasker software (4.1.4) [21] was used to predict repeat regions, which, along with flanking regions (2 Kb upstream and downstream), were divided into 50 bins for average methylation level calculations. Methylation distribution was analyzed around the transcription start site (TSS) and transcription termination site (TTS), considering the influence on gene transcription. The average methylation levels were plotted, with regions divided into 50 bins for detailed analysis.
2.6. Analysis of Sequence Features
According to Zhang et al. [22], the 1000 CpG, CHG, and CHH sites with the highest methylation levels were selected for sequence feature analysis using weblogo software (3.7.12).
3. Results
3.1. High-Quality Sequencing Data and Genome-Wide DNA Methylation Profiling of N. ceranae
In this study, we obtained 726,558 clean reads, with an N50 value of 9233 bp and an N90 value of 5584 bp. The average read quality was 8.96, and the average length of clean reads was 8175 bp (Figure 1A). The clean reads aligned to the reference genome (assembly ASM98816v1, Nosema ceranae BRL01) with an alignment rate of 95.75%, indicating high data quality suitable for subsequent analysis. Furthermore, genome coverage increased concomitantly with sequencing depth, confirming that the achieved depth was sufficient for the detection of numerous methylation sites (Figure 1B). Clean reads were distributed across all N. ceranae chromosomes, exhibiting the highest read distribution on contig NW_020169296.1 and the lowest on NW_020169325.1, which indicated good sequencing randomness (Figure 1C). Following quality control, we identified a total of 140,711 CpG, 170,035 CHG, and 1,053,635 CHH methylation sites.
3.2. Profiling of 5mC Methylation in Repetitive Regions, Gene Elements, and Sequence Motifs
The methylation levels displayed a broad dynamic range: CpG averaged 0.015 (range: 0.002–0.155), CHG averaged 0.15 (range: 0.0258–0.8853), CHH averaged 0.24 (range: 0.0099–0.78295 per bin (Figure 2A–C). Strikingly, analysis of gene regions showed that the gene body exhibited the highest methylation levels in all three sequence contexts (CpG, CHG, and CHH) (Figure 2D–F). For CpG, CHG, and CHH, the average methylation levels per bin were 0.22, 0.26 and 0.044, respectively (Figure 2D–F). We further analyzed the 9-bp sequence features surrounding 5mC sites and found that the base adjacent to the methylated cytosine at CHG and CHH sites was T[A/C]G and T[A/C]T[A/C], respectively (Figure 2G–I).
4. Discussion
Epigenetics is crucial for regulating gene expression, with DNA methylation being a primary modification mode [23]. Whole-genome DNA methylation has been studied in various species, including H. sapiens [24] and O. sativa [25]. In this study, we explored DNA methylation in N. ceranae, identifying 140,711 CpG, 170,035 CHG, and 1,053,635 CHH sites.
DNA methylation in CpG islands, CHH, and CHG often inhibits genomic DNA by producing methylated repeat sequences, protecting the genome [26]. However, Nosema ceranae, as a fungal parasite, lacks homologs of key methylation-related enzymes (e.g., DRM2 and CMT3) present in other species. In the early stage, we cloned Methyltransferase-like protein 5 (Mettl5) and N6-adenine-specific methyltransferase (N6AMT) which are related to methylation in N. ceranae [27,28]. Nevertheless, it is still uncertain whether N. ceranae possesses a methylation mechanism.
We observed higher CHH and lower CpG methylation levels in repetitive regions, consistent with chromosomal patterns. This indicates cytosine methylation mainly occurs in CHG and CHH contexts in repeats, confirming enrichment in DNA repeat sequences [29].
The TSS region is crucial for gene expression regulation [30]. Methylation in this region is associated with transcriptional silencing [31]. While our study identified 5mC methylation around the TSS and gene body regions, suggesting potential roles in gene regulation, we acknowledge that these patterns alone do not provide direct evidence of their functional significance. In other words, no functional correlation was established between the observed methylation and gene expression, which is a limitation of this study. 5mC is a common modification in eukaryotic genomes, especially in CpG dinucleotides [32]. We identified 5mCpG, 5mCHG, and 5mCHH motifs in N. ceranae, indicating conservation across species.
5. Conclusions
In conclusion, using ONT sequencing, we identified a wide range of methylation sites in N. ceranae, providing a foundation for further exploration of the epigenetic roles and mechanisms of DNA methylation in this organism. In future research we will explore the role of DNA methylation in N. ceranae virulence and its interactions with honeybee hosts.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Elhamamsy A.R. DNA methylation dynamics in plants and mammals: Overview of regulation and dysregulation Cell Biochem. Funct.20163428929810.1002/cbf.318327003927 · doi ↗ · pubmed ↗
- 2Wang S. Wu W. Claret F.X. Mutual regulation of micro RN As and DNA methylation in human cancers Epigenetics 20171218719710.1080/15592294.2016.127330828059592 PMC 5406215 · doi ↗ · pubmed ↗
- 3Dahlet T. Argüeso Lleida A. Al Adhami H. Dumas M. Bender A. Ngondo R.P. Tanguy M. Vallet J. Auclair G. Bardet A.F. Genome-wide analysis in the mouse embryo reveals the importance of DNA methylation for transcription integrity Nat. Commun.202011315310.1038/s 41467-020-16919-w 32561758 PMC 7305168 · doi ↗ · pubmed ↗
- 4Li P. Yang H. Wang L. Liu H. Huo H. Zhang C. Liu A. Zhu A. Hu J. Lin Y. Physiological and transcriptome analyses reveal short-term responses and formation of memory under drought stress in rice Front Genet.2019105510.3389/fgene.2019.0005530800142 PMC 6375884 · doi ↗ · pubmed ↗
- 5Chen H. Shu H. Wang L. Zhang F. Li X. Ochola S.O. Mao F. Ma H. Ye W. Gu T. Phytophthora methylomes are modulated by 6m A methyltransferases and associated with adaptive genome regions Genome Biol.20181918110.1186/s 13059-018-1564-430382931 PMC 6211444 · doi ↗ · pubmed ↗
- 6O’brown Z.K. Boulias K. Wang J. Wang S.Y. O’brown N.M. Hao Z. Shibuya H. Fady P.-E. Shi Y. He C. Sources of artifact in measurements of 6m A and 4m C abundance in eukaryotic genomic DNABMC Genom.20192044510.1186/s 12864-019-5754-6PMC 654747531159718 · doi ↗ · pubmed ↗
- 7Suelves M. CarrióE. Núñez-Álvarez Y. Peinado M.A. DNA methylation dynamics in cellular commitment and differentiation Brief. Funct. Genom.20161544345310.1093/bfgp/elw 01727416614 · doi ↗ · pubmed ↗
- 8Reik W. Walter J. Genomic imprinting: Parental influence on the genome Nat. Rev. Genet.20012213210.1038/3504755411253064 · doi ↗ · pubmed ↗
