De Novo Assembly, Characterization and Comparative Transcriptome Analysis of the Mature Gonads in Megalobrama terminalis
Yicheng Zhou, Weiqian Liang, Kaifeng Wang, Peng Zheng, Shengyue Lin, Haiying Yang, Guojun Cai, Ziyan Deng, Chong Han, Qiang Li

TL;DR
This study analyzes the gonadal transcriptomes of a South China fish to identify genes involved in sex differentiation and reproduction.
Contribution
The first comparative transcriptome analysis of gonads in Megalobrama terminalis, identifying key genes related to gonadal development.
Findings
84,886 unigenes were assembled, with 42,322 annotated to major databases.
14,972 differentially expressed genes were identified, including those related to steroidogenesis and gametogenesis.
Results provide insights into genes and pathways involved in gonadal differentiation in teleost fish.
Abstract
Megalobrama terminalis is an economically important fish indigenous to South China. However, there is less research on gonadal differentiation and development in M. terminalis. In this study, the gonadal transcriptome of female and male M. terminalis was analyzed for the first time. Many differentially expressed genes (DEGs) relevant to steroidogenesis, gonadal differentiation and development, and gametogenesis were identified after comparing the transcriptomes of male and female gonads. The annotation results of DEGs showed that some reported genes related to gonadal differentiation in teleost fishes may play essential roles in M. terminalis. Results of this study will provide data to support the reproduction and breeding of M. terminalis. Megalobrama terminalis is a significant aquatic fish in South China, renowned for its tasty meat. Nonetheless, related studies are deficient…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8- —National Natural Science Foundation of China
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic and Clinical Aspects of Sex Determination and Chromosomal Abnormalities · Reproductive biology and impacts on aquatic species · Sperm and Testicular Function
1. Introduction
Megalobrama are widely distributed freshwater fish in China, inhabiting the Yangtze River, Pearl River, Yellow River, and Hainan Island, among other river systems. They are favored by the public for their tender texture and exceptional taste, which makes them an important aquatic product in China [1,2,3]. Therefore, genomics [4], transcriptomics [5], molecular markers [6], and germplasm survey of this genus have been extensively investigated. However, research on M. terminalis, a species of Megalobrama, is limited.
M. terminalis belongs to Megalobrama, Culterinae, Cyprinidae, Cypriniformes, mainly distributed in the Pearl River system and Hainan Island water system, and it is an economically important fish endemic to the South China region [7,8]. Due to the superimposed effects of multiple anthropogenic factors such as hydraulic engineering, channel dredging, water pollution, and overfishing, M. terminalis resources have been decreasing since 1970, and M. terminalis resources dropped to roughly 5 tons in the 1980s [9,10]. A few studies have been conducted on M. terminalis, mainly focusing on the taxonomic status and evolutionary history through mitochondrial sequences [11,12,13], exploration and improvement of the feed composition [14,15], and artificial propagation and crossbreeding techniques for M. terminalis [16,17]. However, less attention has been paid to the gonadal development, differentiation, maturation, and gametogenesis of M. terminalis. To better promote the study of the mechanisms of gonadal development and reproduction of M. terminalis, it is vital to explore the genes and pathways related to gonadal differentiation using the RNA-seq technique.
Fish are different from higher vertebrates. Their sex determination and gonadal differentiation are intricate and are affected by environmental and genetic factors, and many other variables. Up to now, some important pathways and genes have been identified in fish [18,19]. Moreover, due to the diversity of fish species and living habitats, gonadal development and sex differentiation of fish are diversified [20]. Next-generation sequencing (NGS)-based transcriptome sequencing can quickly and cost-effectively generate large amounts of transcript sequences and mRNA expression data with high-throughput advantages, enabling efficient gene expression profiling and providing data support for fish research [21]. Currently, NGS-based transcriptome sequencing has been applied in many fish species such as Coreoperca whiteheadi [22], Scortum barcoo [23], Siganus oramin [24], and Spinibarbus hollandi [25]. These studies provide important insights into the mechanisms of gonadal development, reproduction-related genes, and complement the gaps in gonadal transcriptome studies in specific fish species.
In the present study, gonadal RNA-seq of female and male M. terminalis was performed using the Illumina platform, followed by de novo assembly and gene annotation. The comparative transcriptomics approach was employed to systematically analyze gene expression differences potentially involved in gonadal differentiation and gametogenesis. This study aims to further expand the existing genetic and genomic data resources, and through deepening gene expression analysis and functional gene mining, to screen and identify genes related to key reproductive processes such as gonadal differentiation and development and gametogenesis as comprehensively as possible, to lay a solid data foundation for subsequent in-depth investigation of the molecular mechanism of reproduction in M. terminalis.
2. Materials and Methods
2.1. Sampling
Six 2-year-old and sexually mature M. terminalis individuals (three females and three males) were acquired from an aquaculture farm in Shaoguan, Guangdong province. The fish were subsequently anesthetized and euthanized using MS-222 (STAHERB, Changsha, China) [26]. Gonadal tissues were immediately dissected, placed in an RNA keeper (Vazyme, Nanjing, China), and stored at −20 °C.
2.2. RNA Extraction and cDNA Library Construction
Total RNA from gonadal tissues was extracted using FreeZol regent (Vazyme, Nanjing, China), which was operated strictly according to the manufacturer’s instructions. RNA concentration and purity were initially checked using a NanoPhotometer N60 (Implen, Munich, Germany), and further assessed for integrity and quantitative analysis by an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Eukaryotic transcriptome sequencing libraries were constructed using the NEBNext^®^ Ultra™ RNA Library Preparation Kit (NEB, Ipswich, MA, USA), which includes the following steps: enrichment of mRNAs with polyA tails using Oligo (dT) magnetic beads, fragmentation of mRNAs by ultrasound; synthesis of mRNAs in an M-MuLV reverse transcriptase system using fragmented mRNAs and random oligonucleotides as primers. The first strand of cDNA was subsequently synthesized in the DNA polymerase I system using dNTPs as the raw material for the second strand. The purified double-stranded cDNA was end-repaired, added A tails, and ligated to the sequencing adaptor. cDNAs of about 200 bp were selected with AMPure XP beads (Beckman Coulter, Brea, CA, USA), PCR amplification was performed, and the PCR products were purified again using AMPure XP beads, and the libraries were finally obtained.
2.3. Library Sequencing, De Novo Assembly, and Annotation
Qualified libraries underwent a pair-end 150 sequencing strategy (PE150) utilizing the Illumina Novaseq6000 sequencing platform (Illumina, San Diego, CA, USA), yielding a minimum sequencing data volume of 6 Gb per sample. Raw sequencing data were filtered to remove reads with adapter contamination and low quality (N base >10%) to get clean reads, which were subsequently stored in FASTQ format. De novo assembly of the transcriptome was performed using Trinity 2.8.4 [27], and the quality was evaluated using N50 and BUSCO 3.0.2 [28]. At last, sequences of coding regions of the assembled unigene were predicted by six reading frames (3 forward and 3 reverse).
The predicted protein sequences were aligned with the Nr database (https://ftp.ncbi.nlm.nih.gov/blast/db/ (accessed on 20 April 2025)) and the SwissProt database (https://www.uniprot.org/ (accessed on 2 May 2025)), selecting the coding mode of the optimal alignment (maximum alignment score) as the gene’s coding region. Homology predictions of unigenes were subsequently done through comparisons with the Nr, KOG (https://www.ncbi.nlm.nih.gov/COG/ (accessed on 3 May 2025)), KEGG (http://www.kegg.jp, (accessed on 4 May 2025)), and SwissProt databases using the BLAST+ 2.6.0. The Nr results were subsequently annotated using Blast2GO 6.0 (GO, http://eggnog5.embl.de/download/emapperdb-5.0.2/ (accessed on 13 May 2025). Unigenes were categorized based on biological processes, cellular components, and molecular functions.
2.4. Identification of Differentially Expressed Genes and Enrichment Analysis
Clean reads from each sample were aligned to the assembled transcripts utilizing hisat2 [29], and gene expression metrics, including Fragments Per Kilobase of exon model per Million mapped fragments (FPKM) and coverage, were computed using RSEM 1.2.19 [30]. The python script prepDE.py was used to convert the output of stringtie to the compatible format of the edgeR package (V3.6), and edgeR was used for the analysis of differentially expressed genes (DEGs), aiming to identify the significantly differentially expressed genes with the filtering thresholds of FDR value < 0.05 and |log2FC| > 2. Finally, using the clusterProfiler program in the R package, significant GO terms and KEGG pathways (FDR < 0.05) were further enriched based on these DEGs following Fisher’s exact test and Benjamini-Hochberg correction [31].
2.5. Validation of DEGs Using Quantitative Real-Time PCR
The reliability of the data was confirmed through quantitative reverse transcription polymerase chain reaction (qRT-PCR). Sixteen DEGs related to gonadal differentiation and development were chosen for validation. β-actin was selected as the reference gene. Primers were designed using Primer-BLAST (Table 1). The specific steps were as follows: cDNA template was synthesized by HiScript III RT SuperMix (Vazyme, Nanjing, China), and amplified by SYBR Green qPCR Mix (GDSBIO, Guangzhou, China) on the LightCycler 480 platform. The reaction program was as follows: The reaction program included pre-denaturation at 95 °C for 3 min, followed by 40 cycles of 95 °C for 10 s, 60 °C for 20 s, and 72 °C for 20 s, concluding with a final extension at 72 °C for 5 min, which included melting curve analysis. Each sample was subjected to three biological replicates, and the results were normalized using the comparative CT method (2^−∆∆CT^) [32]. Results are presented as the mean ± standard error of the mean (Mean ± SEM).
2.6. Gonadal Histology
Gonadal tissues were first dehydrated in 70–100% alcohol in a gradient (gradient of 10%) and then made transparent with xylene. And then embedded in paraffin for 3 h before cutting into slices ranging from 3 to 6 μm. Qualified slices were dried overnight at 42 °C and stained with hematoxylin and eosin. The transparent slides were promptly sealed using neutral resin and left to air-dry at 25 °C overnight. Finally, slices were observed and photographed using a light microscope (Leica DM3000, Wetzlar, Germany) [23].
3. Results
3.1. Overview of Transcriptome Assembly Quality
cDNA libraries were constructed from three ovary and three testis samples using RNA-seq. Following data quality control and filtering of low-quality data, clean reads totaling 41.18 GB were generated, with an average of 6.86 GB per sample, utilizing the Illumina HiSeq platform. Base quality values Q20 and Q30 were 99.28% and 97.36%, respectively (Table 2). In addition, PCA results showed significant differences between testis and ovary samples and good consistencies in parallel samples (Figure 1). A total of 84,886 unigenes were successfully assembled, with an overall length of 92.26 MB. The lengths of individual unigenes varied from 201 to 20,142 bp, with an average length of 1086 bp and a N50 of 2238 bp (Table 3). However, the majority of unigenes were 1–1000 bp in length (70.7%) while those with lengths greater than 1000 accounted for only 29.3% of the total (Figure 2).
3.2. Unigene Annotation
A total of 42,322 unigenes were successfully annotated among all the assembled unigenes. The number of annotated unigenes and their percentage in the five major databases (Nr, SwissProt, KEGG, GO, and KOG) in descending order were 40,529, 95.8%; 25,877, 61.1%; 25,801, 60.9%; 25,055, 59.2%; and 20,139, 47.6%, respectively (Table 3). In the Nr annotation, the results showed that Megalobrama amblycephala had the largest number of homologous genes with M. terminalis (18,324), followed by Culter albumus (4915), while the remaining species had less than 4000 homologous genes (Figure 3).
In addition, all unigenes were further annotated in KEGG, GO, and KOG databases to predict their functions and categorize them. The KEGG database categorized 25,801 unigenes into metabolism, genetic information processing, environmental information processing, cellular processing and organismal systems (Figure 4A); In the GO database, a total of 25,055 unigenes were annotated into three functional categories of biological process, cellular component and molecular function and the most representative terms in each category were cellular process, cellular anatomical entity and binding (Figure 4B); Finally, the KOG database categorized 20,139 unigenes into 25 families, of which the two families with the highest proportion were signal transduction mechanisms and general function prediction only (Figure 4C). More results of KEGG and GO annotation can be found in Tables S1 and S2.
3.3. Differential Expression Analysis
A total of 14,972 DEGs were identified between ovary and testis samples, of which 11,928 were significantly upregulated in the testis and 3044 were significantly downregulated (Figure 5). KEGG enrichment analysis showed that among the top 25 significantly enriched pathways, there are multiple pathways related to gonad development, such as cell cycle, MAPK signaling pathway, and Calcium signaling pathway (Figure 6).
In addition, several DEGs related to steroidogenesis, gonadal differentiation and development, and gametogenesis were identified. Aromatase (cyp19a1a), cytochrome P450 26A1 (cyp26a1), estradiol 17-beta-dehydrogenase 1 (hsd17b1) and synaptonemal complex protein 2-like (sycp2l) are predominantly expressed in ovary; Doublesex- and mab-3-related transcription factor 1 (dnmrt1), transcription factor SOX-30-like (sox30), tektin-4 (tekt4), and cholesterol 25-hydroxylase-like protein (ch25h) are predominantly expressed in testis (Table 4).
3.4. Validation of Transcriptomic Data by qRT-PCR
The differential expression results of qRT-PCR, including 16 DEGs (dmrt1, cyp11b, sycp1, cyp26a1, rln3, cyp27a1, star, gdf9, bmp15, hsd17b12a, hsd17b1, foxl2, and zp3), were consistent with those of RNA-seq (Figure 7, Table 4).
3.5. Gonadal Histology Analysis
The testes and ovaries of the two-year-old M. terminalis were utilized for histological analysis. The testes had numerous spermatids and interstitial cells, coupled with a limited quantity of sperm (Figure 8A), whereas the ovaries were abundant in mature oocytes (Figure 8B). Histological results showed that the gonads of male and female M. terminalis were mature.
4. Discussion
Sex hormone synthesis, gonadal differentiation, and gametogenesis are complex processes involving numerous specific pathways and genes. Transcriptome sequencing has been widely used to obtain comprehensive genetic information from specific organs and tissues of organisms, as it effectively identifies differences in gene expression among different treatment groups. M. terminalis is a commercially significant fish in southeastern China. However, overfishing has resulted in a substantial decline of its wild resources in recent years, necessitating urgent research on its captive breeding and cultivation. In this study, genes related to gonadal differentiation and gametogenesis in M. terminalis were revealed by transcriptome sequencing for the first time.
4.1. DEGs Involved in Steroidogenesis
Sex hormones are classified as steroid hormones, essential for embryonic development, gonadal differentiation, and gametogenesis in teleost fish [33]. Steroids are synthesized from their precursor, cholesterol. Notably, within our annotated genes, cyp27a1 is involved in the initial hydroxylation reaction of cholesterol in the Japanese Lamprey Lethenteron reissneri [34], while ch25h, which is crucial for cholesterol metabolism, has been reported to exhibit high expression levels in the gonads of Cynoglossus semilaevis [35]. In addition, cyp11a1 encodes the type I cytochrome P450 enzyme located in mitochondria, which catalyzes the transformation of cholesterol into pregnenolone [36]. star encodes the steroidogenic acute regulatory protein (StAR), which facilitates the transport of cholesterol from the mitochondria and has been reported to regulate the transformation of cholesterol to steroids in some teleost fish [37,38].
11-Ketotestosterone (11-KT) and 17-estradiol (E2) are the major androgens and estrogens in teleost fish [25]. Their production requires the involvement of several P450 family and HSD family genes [39], and we identified several related genes. Previous studies have shown that hsd17b12a catalyzes the transformation of 11-ketoandrostenedione (11-KA4) to 11-KT in the testis of Anguilla japonica [40]. Additionally, hsd17b1 and hsd17b12a were predicted to participate in the synthesis of E2 in the ovaries of female Japanese eels [41]. The CYP11B enzyme facilitates the conversion of progesterone to the steroid testosterone and androstenedione in humans [42]. Concurrently, cyp11b was identified as being expressed in the ovaries and testes of Paralichthys olivaceus, and a reduction in the transcript level of cyp11b following E2 treatment implies its potential role in estrogen synthesis in fish [43]. Additionally, the deletion of cyp17a1 resulted in decreased E2 levels in zebrafish plasma, while concurrently increasing progesterone and DHP concentrations in the testes, indicating that cyp17a1 regulates sex hormone production in fish [44]. In summary, these DEGs play an important role in the steroidogenesis of M. terminalis, and the species may share the same pattern of steroidogenesis with other teleost fish.
4.2. DEGs Involved in Gonadal Differentiation and Development
In mammals, gonadal differentiation and development are regulated by several genes and pathways, such as the WNT/β-catenin signaling pathway [45,46], the sry, sox [47], dmrt, and fst gene family. In vertebrates, gonadal differentiation is more intricate in fish compared to other species. Fish show several patterns of sexual differentiation, ranging from hermaphroditic to differentiated or undifferentiated gonochoristic species [48]. Among the genes identified in this study, several have been reported to be associated with gonadal differentiation in fish. In mandarin fish and Pengze crucian carp, dmrt1 was found to be predominantly expressed in the testis [49], whereas in zebrafish ovary to testis transformation, dmrt1 expression was found to be upregulated [50], suggesting that dmrt1 mainly acts on testis development, which was similar to the result that the testis of M. terminalis had a higher dmrt1 expression than that in the ovaries. Similarly, dmrt2 is also highly expressed in the testis of M. terminalis and several other fish species, and its expression increases during testis development, suggesting that dmrt2 may have similar roles as dmrt1 [49,51]. Fsta is a follicular repressor-encoding gene, and treatment with follicular repressor reverses zebrafish females to males [52]. Thus, higher expression of fst in the testis of M. terminalis suggests that fst may also have a role in the development of its testis.
In addition, we identified several ovarian development-related genes. Foxl2 is a forkhead transcription factor essential for proper reproductive function in females. It is involved in almost all stages of ovarian development. A central role of FOXL2 is the lifetime maintenance of GC identity through the repression of testis-specific genes [53]. It has been reported to play a key role in ovarian differentiation in various fish species such as Paralichthys olivaceus [54,55]. The bone morphogenetic protein 15 (bmp15) and growth differentiation factor 9 (gdf9) genes are relevant members of the TGFβ superfamily that encode proteins secreted by the oocytes into the ovarian follicles [56]. In M. terminalis, foxl2, bmp15, and gdf9 were highly expressed in the ovary, suggesting that they may play an important role in ovarian development in M. terminalis.
4.3. DEGs Involved in Gametogenesis and Gamete Maturation
In aquaculture, optimal gamete development is crucial for reproduction and enhanced yield of economically important fish. Meiosis in spermatocytes and oocytes is an essential step in gametogenesis. cyb26a1 is identified as a gene that facilitates meiotic start in various fish species, and its downregulation accelerates the commencement of meiosis [57,58,59]. Additionally, both sycp1 and msh5 are essential for meiosis in mice. Mutations in sycp1 result in meiotic arrest and spermatocyte mortality in males [60], whereas such mutations in female zebrafish cause a decrease in normal offspring, and mutations in males cause sterility [61]. Murine Msh5 facilitates the synapsis of homologous chromosomes during meiotic prophase I [62]. In M. terminalis, all three meiosis-associated genes are upregulated in the testis, potentially correlating with the increased incidence of meiosis in the spermatheca.
We have also found genes exclusive to spermatogenesis and oogenesis. In Nile tilapias, mutations in rln3a result in decreased sperm tail length and the formation of two-tailed spermatozoa [63], while sox30-deficient female mice exhibit normal phenotypes, and males are sterile [64]. In addition, TEKT4 protein is localized to the flagellum of mouse spermatozoa, and deletion of tekt4 results in reduced fertility in male mice [65], suggesting that it is important for spermatogenesis. In addition, SYCP2L, which is important for meiosis in the oocyte, is a paralogue of the synaptonemal complex protein SYCP2 and is expressed exclusively in oocytes [66], and the gene sycp2l was highly expressed in the ovary of M. terminalis. These genes are differentially expressed in the gonads of M. terminalis, suggesting that they are important for spermatogenesis and oogenesis in M. terminalis.
5. Conclusions
We present the first transcriptome data from the gonads (testis and ovary) of M. terminalis, utilizing Illumina sequencing. A total of 14,972 DEGs were identified, with 11,928 exhibiting upregulation in the testis and 3044 showing downregulation. Furthermore, we found numerous DEGs linked to steroidogenesis, gonadal differentiation and development, and gametogenesis in mammals and teleost fish, indicating that these genes are likely to play significant roles in M. terminalis. The results will provide data to underpin further studies on gonadal differentiation in M. terminalis, as well as its reproduction and breeding.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Chen J. Population Genetics of Megalobrama Based on Whole Genome Resequencing Huangzhong Agricultural University Wuhan, China 2021(In Chinese with English Abstract)
- 2Lai R. Zhang X. Li Y. Wu J. Yang D. Wang W. Comparison and phylogenetic analysis of mitochondrial genomes of Megalobrama J. Fish. China 201438114(In Chinese with English Abstract)
- 3Gong D. Wang X. Yang J. Liang J. Tao M. Hu F. Wang S. Liu Z. Tang C. Luo K. Protection and utilization status of Parabramis and Megalobrama germplasm resources Reprod. Breed.20233263410.1016/j.repbre.2023.01.003 · doi ↗
- 4Hu X. Luan P. Cao C. Li C. Jia Z. Ge Y. Shang M. Wang S. Meng Z. Tong J. Characterization of the mitochondrial genome of Megalobrama terminalis in the Heilong River and a clearer phylogeny of the genus Megalobrama Sci. Rep.20199850910.1038/s 41598-019-44721-231186443 PMC 6559948 · doi ↗ · pubmed ↗
- 5Liu K. Xie N. Full-length transcriptome assembly of black amur bream (Megalobrama terminalis) as a reference resource Mol. Biol. Rep.202451110110.1007/s 11033-024-10056-z 39470845 · doi ↗ · pubmed ↗
- 6Gao Z. Luo W. Liu H. Zeng C. Liu X. Yi S. Wang W. Fuentes J. Transcriptome analysis and SSR/SNP markers information of the blunt snout bream (Megalobrama amblycephala)P Lo S ONE 20127 e 4263710.1371/journal.pone.004263722880060 PMC 3412804 · doi ↗ · pubmed ↗
- 7Xu W. Xiong B. Research progress of Megalobrama in China J. Hydroecology 200829711(In Chinese with English Abstract)
- 8Liu Y. Li X. Li Y. Yang G. Xu T. Histological study on gonad development of Megalobrama terminalis South China Fish. Sci.201915113118(In Chinese with English Abstract)
