A comprehensive analysis of the inherited lncRNA and circRNA repertoire of zebrafish
Dheeraj Chandra Joshi, Aakanksha Kadam, Chetana Sachidanandan, Beena Pillai

TL;DR
This paper identifies inherited lncRNAs and circRNAs in zebrafish, showing they are conserved and may play roles in development.
Contribution
A comprehensive resource of inherited lncRNAs and circRNAs in zebrafish, revealing their conservation and potential developmental roles.
Findings
Nearly 20% of lncRNAs and 7% of circRNAs in zebrafish are inherited.
Many inherited lncRNAs are conserved in mammals and expressed in adult zebrafish tissues.
Inherited circRNAs originate from genes important for fertilization and may regulate translation.
Abstract
Inherited non-coding RNAs can be the third major component of epigenetic information transfer from one generation to the next. Here, we present a comprehensive resource of lncRNAs and circular RNAs that are inherited, compiled from meta-analysis of zebrafish transcriptomics data and comparative genomics with mouse and human. Maternal and paternal inheritance of mRNA into the zygote is accepted to be an important regulator of embryonic development as well as adult characteristics. Although inheritance of certain specific miRNAs is known, other non-coding RNA inheritance remains less explored. We performed a comprehensive analysis of the inherited lncRNAs and circRNAs in zebrafish. We discovered that nearly 20% of all known lncRNA and 7% of circRNAs are inherited. Many of these lncRNAs are conserved in mammals, and are expressed widely in adult tissues of zebrafish. The male and female…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7- —Council for Scientific and Industrial Research (CSIR), India
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCancer-related molecular mechanisms research · Circular RNAs in diseases · MicroRNA in disease regulation
1 Introduction
The genetic information encoded by DNA can be regulated and transmitted through non-DNA-based means collectively termed epigenetic inheritance. Such mechanisms play crucial roles in cellular function, embryonic development, and adaptation. Besides the parental DNA, the zygote is also known to inherit specific DNA methylation patterns, histone modification patterns, proteins, mRNAs, and regulatory RNAs that influence how the genome is read and interpreted. These factors can have housekeeping or developmental roles and could be involved in the adaptation to changing environmental conditions and stressors. Small RNA inheritance has been particularly well studied in the Caenorhabditis elegans. Inherited endo-siRNAs, miRNAs, and piRNAs in C. elegans play important roles in processes as diverse as maintaining genome integrity (Lee et al. 2012), clearance of maternal transcripts (Rouget et al. 2010), adaptation to pathogens (Kaletsky et al. 2020), and starvation (Rechavi et al. 2014). We have previously shown the inheritance of miR-34 in zebrafish (Soni et al. 2013) and subsequently others have shown that reduced levels of miR-34/449 in the mouse sperm is crucial for the inheritance of chronic social instability stress associated phenotype (Champroux et al. 2024). Parental inheritance of other types of small RNA has also been shown, for instance tRFs (tRNA fragments) or tsRNAs (tRNA derived small RNAs) comprising 60%–70%, dominate the small RNA repertoire of mouse sperm (Conine et al. 2018). Additionally, mouse sperm also contain miRNAs (∼10%–20%) and they have been implicated in both normal embryonic development as well as inheritance of traits. microRNAs such as miR-880, miR-17–92 and miR-106b-25 clusters, and miR-34b/c inherited from mouse sperm are shown to be important for embryo viability (Conine et al. 2019).
Long regulatory RNAs such as lncRNAs and circRNAs are emerging as important components of epigenetic regulation. High throughput sequencing datasets have revealed the presence of hundreds to thousands of lncRNAs in gametes and zygote of diverse organisms including mouse (Karlic et al. 2017, Zhang et al. 2017), pig (Yang et al. 2022), boar (Fraser et al. 2020), cattle (Wang et al. 2020, 2023), ram (Hitit et al. 2024), and humans (Zhang et al. 2019, Corral-Vazquez et al. 2021). Some studies have also reported an association between sperm lncRNA profile and fertility in ram (Hitit et al. 2024) and humans (Zhang et al. 2019). One recent study in mice showed altered lncRNA profile in sperm in response to stress hormone and injecting the altered lncRNAs in normal zygotes resulted in behavioral phenotypes in the offspring (Hoffmann et al. 2024).
Circular RNAs (circRNAs) are covalently closed RNA molecules that are resistant to exonuclease action. Since their discovery around 40 years ago, numerous circRNAs have been identified in gametes and zygote of various organisms including pig (Cao et al. 2019), humans and mouse (Ragusa et al. 2019). Some studies have also reported the presence of hundreds of circRNAs in pre-implantation embryos of humans (Dang et al. 2016) and mouse (Fan et al. 2015). circRNAs are associated with male infertility (Manfrevola et al. 2020, Tang et al. 2023) and ovarian maturation (Li et al. 2021).
The presence of lncRNAs and circRNAs in gametes of all these species suggests that their inheritance in the zygote could be a widespread phenomenon. Unlike the inherited small non-coding RNAs, the inheritance of lncRNAs and circRNAs has not been well studied. The ex utero fertilization of zebrafish makes them an excellent model to study RNA inheritance. Extensive knowledge of zygotic genome activation and early development in zebrafish lends itself to functional studies of inherited RNAs.
Here, we have identified the inherited lncRNA and circRNA repertoire of zebrafish and explored their features. We identified 2093 lncRNA and 270 circRNA that are inherited in zebrafish. The genes neighboring to the inherited lncRNAs are enriched for translation and embryonic development related pathways. Majority of inherited lncRNAs and circRNAs are present in both gametes. There is a distinct population of inherited lncRNA and circRNAs that are exclusively of maternal origin. We find that more than 250 inherited lncRNAs of zebrafish have at least one conserved counterpart in the inherited lncRNA pool of both mouse and human. Notably, we found that the majority of inherited lncRNAs and circRNAs retain their zygotic expression till 4 h post-fertilization (hpf) and there is no ZGA-associated degradation countering the idea that there is a significant turnover and replacement of RNA population around the ZGA phase.
2 Methods
2.1 Meta-analysis of publicly available RNA sequencing datasets to identify inherited lncRNAs
We identified two samples (two different studies) of sperm, seven samples (four different studies) of oocytes and 10 samples (four different studies) of zygote RNA sequencing that were publicly available for zebrafish (Supplementary File S1, available as supplementary data at Bioinformatics Advances online). Similarly, we identified four sperm samples from one study and four oocytes and two zygote samples from another study for mouse (Supplementary File S1, available as supplementary data at Bioinformatics Advances online). For human, we identified four sperm samples from one study, three oocytes and five zygote samples from another study (Supplementary File S1, available as supplementary data at Bioinformatics Advances online). All these sequencing files were downloaded from the SRA (sequence read archive) of NCBI. The publicly available RNA sequencing datasets analysed for the identification of inherited lncRNAs in this study are a mix of unstranded and stranded datasets (Supplementary File S1, available as supplementary data at Bioinformatics Advances online). In the present analysis, we performed the analysis in an unstranded way for uniformity. The data were then analysed using the bioinformatic pipeline outlined in Fig. 1. Briefly, a quality check on all the RNA-seq datasets was performed using FastQC and low quality reads and adapters were trimmed using Trimmomatic-0.39 with following parameters—ILLUMINACLIP: adapters.fa:2:30:10 SLIDINGWINDOW:5:20 LEADING:3 TRAILING:3 MINLEN:50. The alignment of reads to the genome was performed using a splice-aware alignment tool STAR (version 2.7.1a) with the default parameters (STAR—genomeDir/path/to/genomeDir—runThreadN 16—sjdbGTFfile/path/to/gtf_file—readFilesIn input_fastq—sjdbOverhang 100—outSAMtype BAM SortedByCoordinate—outFileNamePrefix Output_bam—quantMode TranscriptomeSAM). The genome assemblies used for the zebrafish, mouse, and human are GRCz11, GRCm38.p6, and GRCh38.p13, respectively. GTF files were obtained from the Ensembl database for the analysis. Zebrafish—Danio_rerio.GRCz11.99.gtf (source—https://ftp.ensembl.org/pub/release-99/gtf/danio_rerio/, 2019-11-22), Genome assembly GRCz11, release-99. Mouse- Mus_musculus.GRCm38.99.gtf (source—Mus_musculus.GRCm38.99.gtf, https://ftp.ensembl.org/pub/release-99/gtf/mus_musculus/, 2019-11-23), Genome assembly GRCm38, release-99. Human- Homo_sapiens.GRCh38.99.gtf (source—https://ftp.ensembl.org/pub/release-99/gtf/homo_sapiens/, 2019-11-22), Genome assembly GRCh38, release-99. Expression quantification was done using the rsem-calculate-expression script of the RSEM package (version 1.3.1) with the following command: rsem-calculate-expression --quiet -p 16 --bam/path/to/bam --output rsem_output --reference/path/to/reference. For each species the resulting expression results (FPKM) were filtered for lncRNAs (i.e. the transcripts annotated as antisense, lncRNA, lincRNA, retained_intron, sense_intronic, sense_overlapping, and processed_transcript in the Ensembl genome browser).
Around 20% of lncRNAs in zebrafish are inherited. Inherited lncRNAs arise from all chromosomes and are mostly located in the vicinity of metabolic and developmental genes. (A) Strategy used to identify the inherited lncRNAs in zebrafish. (B) Stacked bar plots showing the percentage of inherited lncRNAs and mRNAs. (C) Stacked bar plots showing the subcategories of lncRNAs. Above the stacked bar plot the total number of lncRNAs are written and the numbers for subcategories are written on the top of individual categories. The schematics of the different types of lncRNAs are shown on the right of the stacked bar plots. (D) Density plot showing the genomic loci of inherited lncRNAs. The 25 chromosomes of zebrafish are scaled to the same length with zero representing the start of the chromosome and 1000 representing the end of the chromosome. Overall density of all known lncRNAs (10 109) is shown in the background (pink) and the density of inherited lncRNAs (2093) is shown in the foreground (black). (E) GO analysis of protein-coding genes overlapping or flanking to genic and intergenic inherited lncRNAs.
To identify inherited lncRNAs in zebrafish, firstly, the transcripts expressed (criteria - >=0.1 FPKM) in at least one sample of every zygote study were shortlisted. If the shortlisted transcripts from the above step were also expressed (criteria - >=0.1 FPKM) in at least one sample of any sperm or oocyte study then those transcripts were considered inherited. To identify the inherited lncRNAs in mouse and human, more stringent parameters were used as the samples are derived from only two publicly available studies. The lncRNAs expressed in the zygote (>=0.1 FPKM in all samples) and expressed in at least one of the gametes (>=0.1 FPKM in all samples) were considered inherited.
2.2 Meta-analysis of publicly available RNA sequencing datasets to check the adult tissue expression profiles of inherited lncRNAs
For checking the expression of inherited lncRNAs in the adult tissues of zebrafish, we again used publicly available RNA sequencing datasets. We identified four studies that had profiled the expression in 15 adult tissues of zebrafish (Supplementary File S1, available as supplementary data at Bioinformatics Advances online). The tissues are brain, heart, muscle, liver, skin, gill, intestine, gut, testis, ovary, blood, spleen, bones, eye, kidney, and tail. These samples were analysed as per the analysis pipeline described above. Hierarchical clustering of the lncRNA expression profiles was performed using the linkage function from the scipy.cluster.hierarchy module.
2.3 Conservation analysis
For sequence conservation analysis, the nucleotide sequences of 2093, 3785, and 4120 inherited lncRNAs in zebrafish, mouse, and human, respectively, were extracted from the reference transcriptome using samtools faidx script. The zebrafish lncRNAs were searched for similarity in mouse and human sequences using stand-alone BLAST utility of NCBI. An E-value cutoff of 1e−5 and word size 20 were used to identify potentially sequence conserved lncRNAs, with no limit on query cover as regulatory elements can be embedded as small regions of similarity within a large lncRNA.
For identifying overlap conserved inherited lncRNAs, firstly, a list of conserved protein-coding genes in zebrafish, mouse, and human was obtained using biomart utility from Ensembl genome browser. Next, we extracted the genomic coordinates of inherited lncRNAs and conserved protein-coding genes of zebrafish, mouse, and human from their respective GTF files. Using Bedtools intersect script, inherited lncRNAs that have genomic overlap to conserved protein-coding genes were shortlisted for each species. If a protein-coding gene conserved in all three species also shows genomic overlap to inherited lncRNAs in all three species, then these lncRNAs were considered to be conserved by overlap.
For identifying loci conserved inherited lncRNAs, 10 neighboring protein-coding genes for each inherited lncRNA were extracted (five each upstream and downstream) based on the genomic coordinates obtained from GTF files of zebrafish, mouse, and human. Thus, corresponding to 2093, 3785, and 4120 inherited lncRNAs in zebrafish, mouse, and human, respectively, as many sets of 10 flanking protein-coding genes were formed. These sets of flanking protein-coding genes from zebrafish were compared with the sets from both mouse and human for conservation relationship. Zebrafish inherited lncRNAs with five or more flanking protein-coding genes that have conserved counterparts in both mouse and human flanking protein-coding gene sets were considered to be conserved by loci.
2.4 GO analysis
The overlapping protein-coding genes to genic inherited lncRNAs and the two flanking protein genes to intergenic lncRNAs were extracted from the GTF file. The parent genes to inherited circRNAs were identified by CIRI2 pipeline. To identify and categorize the biological processes enriched among these protein-coding genes ShinyGO 0.80 tool was used.
2.5 Analysis of RNA sequencing data of post-fertilization stage and clustering
The publicly available RNA sequencing data of 0 hpf, 1 hpf, 2 hpf, 3 hpf, and 4 hpf zebrafish embryos were reanalysed using the pipeline described in the section 2.1. The details of these samples are given in the table below (Supplementary File S1, available as supplementary data at Bioinformatics Advances online). The expression levels of 2093 inherited lncRNAs and 270 inherited circRNAs were extracted for the five stages. To classify the lncRNAs and circRNAs into distinct expression profiles, we applied the K-means clustering algorithm using the scikit-learn library in Python.
2.6 Meta-analysis of publicly available RNA sequencing datasets to identify inherited circRNAs
We identified two samples (two different studies) of sperm, seven samples (four different studies) of oocytes, and 10 samples (four different studies) of zygote RNA sequencing that were publicly available for zebrafish (Supplementary File S1, available as supplementary data at Bioinformatics Advances online). The raw RNA sequencing reads were subjected to a quality check using FastQC, whereafter we discarded low quality reads and Illumina adapters using Trimmomatic (Bolger et al. 2014). For quantifying the expression of linear RNA, we aligned the processed RNA sequencing reads to the reference genome, zv11 using STAR aligner (Dobin et al. 2013). We used RNA-seq by Expectation-Maximization (RSEM) (Li and Dewey 2011) to quantify the expression of linear RNA transcripts and isoforms.
We used CIRI2 pipeline (Gao et al. 2018) with default settings to identify candidate circRNAs. The processed reads were aligned to the reference genome, zv11 using BWA aligner (Li and Durbin 2009). The output file from this alignment was used to annotate circRNAs using a Perl script available in CIRI2 package. We customized the perl script for our analysis, a circRNA having both coordinates outside any annotated gene boundaries was considered intergenic and it was considered as exonic or intronic if even one of the coordinates were located within annotated gene exon or intron. A candidate circRNA was called if it was supported by a minimum of two backsplice reads and at least a junction ratio of 0.05, which indicates toward the expression of a circRNA relative to its linear RNA. We then proceeded to identify inherited circRNA candidates. A circRNA was only considered to be inherited only if it was present in a zygotic sample along with either of the gametes. We identified 270 inherited circRNAs.
2.7 Comparative analysis of circular RNAs and their linear isoforms
To investigate the host genes from which inherited circRNAs were transcribed, we annotated the circRNAs by linking their genomic positions to zebrafish gene loci. The analysis revealed that a single gene could produce multiple circRNA isoforms. Approximately six inherited circRNAs were found to span the coordinates of more than one gene. Among these, three circRNAs originated from overlapping genes with high sequence similarity. In contrast, two circRNAs aligned to distinct genes with dissimilar sequences. One circRNA, predicted to be approximately 145 kb in length, aligned with three distant genes. To compare circRNAs with their host RNAs, we analysed the expression of each circRNA relative to the host genes to which they aligned. We quantified the expression of all linear RNAs in zebrafish using bioinformatics analysis (Fig. 5A) and compared it with the normalized expression levels of each inherited circRNA.
3 Results
3.1 20% zebrafish lncRNAs are inherited
To identify inherited lncRNAs and circRNAs in zebrafish, we utilized the publicly available RNA sequencing datasets of gametes (sperm and oocyte) and zygote (Supplementary File S1, available as supplementary data at Bioinformatics Advances online). The datasets were reanalysed at transcript level using the latest zebrafish assembly, GRCz11 (Fig. 1A). We reasoned that a transcript detectable in zygote is likely to be inherited since the zygote is known to be transcriptionally inactive. To bolster our conclusion, we also checked whether these zygotic transcripts are detected in at least one of the gametes. The transcripts passing these criteria were considered to be inherited.
We found that approximately 20% of all known lncRNAs in zebrafish are inherited, which amounts to 2093 inherited lncRNAs out of 10 109 total lncRNAs (Fig. 1B). In contrast, approximately 40% of all mRNAs are inherited. Out of the 2093 inherited lncRNAs identified in this study, the majority (∼80%) are genic lncRNAs, whereas the rest are intergenic (Fig. 1C; Supplementary File S2, available as supplementary data at Bioinformatics Advances online). In this study, we use the term “genic” to be synonymous to protein-coding loci. The percentage of inherited genic lncRNAs (80%) is greater than the total percentage of genic lncRNAs (65%) showing a bias toward coding regions in the inherited pool. The genic lncRNAs are further subcategorized into antisense and sense overlapping lncRNAs.
Genomic locations of lncRNAs and the neighboring protein-coding genes can reveal their likely function as several lncRNAs are known to have cis-regulatory effects on the genomic loci of their origin. To explore whether the inherited lncRNAs arise from select locations, like telomeres or heterochromatin in the genome, we visualized the density of inherited lncRNAs versus the overall density of lncRNAs on each of the zebrafish chromosomes. We found that inherited lncRNAs arise almost uniformly from all over the genome (Fig. 1D) with the exception of two notable loci in chr4 and chr25. These genomic regions spanning several megabases do not give rise to any inherited RNAs, even as they contain many lncRNA and protein-coding genes. To understand what category of genes are enriched in the protein-coding genes from the inherited lncRNAs loci, we performed a Gene Ontology (GO) analysis at “Biological Processes” level for genic and intergenic lncRNAs. We found that protein-coding genes from the vicinity of genic inherited lncRNA genes are mainly involved in processes like metabolism and cell cycle. On the other hand, protein-coding neighbors of the intergenic inherited lncRNAs loci are majorly enriched for specific developmental processes (Fig. 1E). Both categories of genes are crucial for early development and inherited lncRNAs may be the regulators of these genes.
3.2 Inherited lncRNAs correspond to broadly expressed zygotic lncRNA
One of the typical features associated with the lncRNAs is that they are low expression transcripts as compared to protein-coding transcripts (mRNAs). We plotted the zygotic expression of all the inherited lncRNAs and inherited mRNAs in the form of violin plots. Consistent with the generalization regarding low expression of lncRNAs, we found that inherited lncRNAs have a four-fold lower median expression than inherited mRNAs in the zygote (Fig. 2A). However, the expression range of inherited lncRNAs is very broad and a minority of inherited lncRNAs (359/2093) are present at levels higher than the median mRNA expression in the zygote.
Inherited lncRNAs have a broad expression range in the zygote. Inherited lncRNAs show similar expression and diversity in sperm and oocytes. (A) Violin plots showing the expression of all inherited lncRNAs (2093) and inherited mRNAs (17 203) in the zygote. The expression levels of well-known inherited mRNAs have been marked. (B) Violin plots showing the expression of antisense (219) sense overlapping (1476) and intergenic inherited mRNAs (398) in the zygote. The expression levels of some notable inherited lncRNAs in each category are marked. (C) Scatter plot showing the expression levels of 2093 inherited lncRNAs in the sperm and oocytes.
Also, we found that there is some variation in the expression levels among the inherited lncRNA subcategories (antisense, sense overlapping, and intergenic). The median expression for sense overlapping inherited lncRNAs is the highest and approximately two-fold higher than the antisense and intergenic inherited lncRNAs (Fig. 2B). We have also identified the presence of some notable inherited lncRNAs (Fig. 2B), e.g. several inherited antisense lncRNAs are derived from hox loci and many sense overlapping inherited lncRNAs are derived from ribosomal protein genes (rpl/rps genes) (Supplementary File S3, available as supplementary data at Bioinformatics Advances online). In the intergenic category, we found two well-known lncRNAs; MALAT1, traditionally studied for its role in cancer and Cyrano, previously shown experimentally to be inherited in zebrafish by our group (Sarangdhar et al. 2018).
3.3 Sperm and oocytes contain similar lncRNA pool
Traditionally, sperm was considered to be merely a vehicle for paternal DNA during fertilization and it was believed that all the non-DNA factors were maternally supplied. However, recent studies have shown that sperm-derived miRNAs and tsRNAs were delivered to the zygote (Chen et al. 2016). To understand whether the inherited lncRNAs in zebrafish are sperm-derived or oocyte-derived, we plotted the expression of 2093 inherited lncRNAs in sperm and oocyte in the form of a scatter plot. We found that the majority of inherited lncRNAs are expressed in both sperm and oocyte at similar levels (Fig. 2C). We also found 62 oocyte specific and six sperm specific inherited lncRNAs (Fig. 2C; Supplementary File S4, available as supplementary data at Bioinformatics Advances online). The six exclusively paternally inherited lncRNAs overlap with protein-coding loci, including one arising from the Dmrt1 gene locus, which is previously known to be involved in transcriptional regulation during germ cell lineage commitment. The most abundant oocyte specific lncRNA, overlaps with the gene Csde1, previously implicated in brain development.
3.4 Post-fertilization stability of inherited lncRNAs is maintained until ZGA, with broad expression observed across multiple adult tissues
Next, we wanted to understand the functional relevance of inherited lncRNAs. One of the possibilities is that inherited lncRNAs could simply be carried over following gametogenesis with no specific function in the embryo. Alternatively, they could act as a source of nucleotides in the early embryo following their degradation. Yet another and more exciting possibility is that the inherited lncRNAs might be regulators of specific zygotic genes. To explore whether the inherited lncRNAs are retained post-fertilization or degraded, we analysed the publicly available RNA sequencing datasets that included the post-fertilization stages of zebrafish development (0 hpf, 1 hpf, 2 hpf, 3 hpf, and 4 hpf). The expression patterns of 2093 inherited lncRNAs could be separated into four distinct groups (Fig. 3A; Supplementary File S5, available as supplementary data at Bioinformatics Advances online). It was observed that the majority (∼96%) of inherited lncRNAs lie in groups 1–3 that mostly retain their original expression till 4 hpf and only ∼4% of lncRNAs in group 4 show patterns consistent with degradation. This observation suggests that the majority of lncRNAs are not degraded, at least until ZGA and thus unlikely to be used as a nucleotide source.
Inherited lncRNAs are mostly stable post-fertilization till ZGA. Inherited lncRNAs are broadly expressed across multiple adult tissues. (A) Line graphs showing the expression dynamics of inherited lncRNAs post-fertilization. Individual lncRNA expression profiles are shown in light gray lines and the average expression trend for each group is highlighted in a thick black line. The groups are arranged by the number of lncRNAs assigned to each group (group 1—highest number to group 4—lowest number). hpf, hours post-fertilization. (B) Clustered heatmap showing the expression levels of inherited lncRNAs (2093) and non-inherited lncRNAs (3080) in adult tissues of zebrafish. The red arrow highlights a set of inherited lncRNAs derived from ribosomal protein genes that are consistently expressed in all the tissues.
Also, we explored the expression pattern of inherited lncRNA counterparts in adult tissues of zebrafish to understand their specificity and possible functional relevance later in life. We utilized the bulk RNA sequencing datasets of 14 different tissues from four different research groups for this analysis (Supplementary File S1, available as supplementary data at Bioinformatics Advances online). As expected, we found the expression of inherited lncRNAs to be enriched in the gonads (testis and ovary) (Fig. 3B). Unexpectedly, we also found that the majority of inherited lncRNAs show a broad tissue expression pattern that was sharply in contrast with the non-inherited lncRNAs that show much more tissue specificity (Fig. 3B; Supplementary File S6, available as supplementary data at Bioinformatics Advances online).
3.5 Over 250 zebrafish inherited lncRNAs have conserved counterparts in both mouse and human zygotes
To find if the zebrafish inherited lncRNAs have conserved counterparts in mouse and human zygotes, we first identified the inherited lncRNAs in mouse and human using a similar criteria as we used for zebrafish. We found 3785 (∼7%) inherited lncRNAs in the mouse and 4120 (∼4%) inherited lncRNAs in humans (Supplementary File S7, available as supplementary data at Bioinformatics Advances online). The details of the publicly available RNA sequencing datasets used for this analysis are provided in Section 2 (Supplementary File S1, available as supplementary data at Bioinformatics Advances online). One of the general features associated with lncRNAs is poor sequence conservation. However, syntenic conservation, by virtue of vicinity or overlap to conserved protein-coding genes, is relatively more common. To identify the inherited lncRNAs in zebrafish that may be conserved with mouse and human, three different definitions for conservation were used; sequence conservation, conservation of overlap with protein-coding genes and syntenic loci conservation (Fig. 4A). We found that 276/2093 inherited lncRNAs of zebrafish have a conserved and inherited counterpart in both mouse and human (Supplementary File S8, available as supplementary data at Bioinformatics Advances online). Consistent with the generalization, we found only 26 out of 2093 zebrafish inherited lncRNAs to be conserved by sequence (Fig. 4B). The sequence identity ranged from 75% to 95% and query coverage ranged from 43 to 745 nucleotides. Majority of conserved inherited lncRNAs (250/276) were exclusively conserved by synteny (overlap conserved or loci conserved). One lncRNA can be conserved by more than one definition and the Venn diagram shows such overlap between the three conservation categories (Fig. 4B).
Counterparts of more than 250 zebrafish inherited lncRNAs are present in both mouse and human zygotes. (A) Schematic showing different strategies used to identify the conserved counterparts of zebrafish inherited lncRNAs in mouse and human. (B) Venn diagram showing the overlap between the three conservation categories.
3.6 Seven percent of the total identified circRNAs, spread across the genome, are inherited in zebrafish
To explore zebrafish circular RNAs in gametes and zygote, we reanalysed publicly available RNA sequencing datasets of oocytes, sperm, and zygote from NCBI GEO/SRA database (Supplementary File S1, available as supplementary data at Bioinformatics Advances online). CircRNA detection tool CIRI2 (Gao et al. 2018) was used to identify 3678 circRNAs. A circRNA was called only if at least two back splice reads overlapping with the junction were supporting the junction and with minimum of 0.05 junction ratio, which is indicative of the expression of a circRNA relative to its linear RNA counterpart (Fig. 5A). A candidate circular RNA was considered inherited only if it was expressed in at least one sample of the zygote, as well as in either or both of the gametes. We identified a total of 270 inherited circRNAs in zebrafish (Fig. 5B; Supplementary File S9, available as supplementary data at Bioinformatics Advances online). Among them ∼86% (231) of these were of genic origin, aligned to an annotated protein-coding gene and ∼14% (198) came from intergenic regions, likely non-coding RNA loci (Fig. 5B). A circRNA was termed as genic, if even one of its coordinates were located in any annotated gene, irrespective of the location of the other end. Multiple back splice junctions were identified from a single host gene.
Around 7% of circRNAs in zebrafish are inherited. (A) Schematic description of bioinformatics analysis for discovery of inherited circular RNAs in zebrafish. (B) Stacked bar plots representing the amount of exonic, intronic, and intergenic circular RNAs identified in all circRNAs and inherited circular RNAs. Above the stacked bar plot the total number of circRNAs are written and the numbers for subcategories are written on the top of individual categories. The schematics of the different types of circRNAs are shown on the right of the stacked bar plots.
3.7 Inherited CircRNAs originate from genes involved in fertilization
We mapped the density of inherited circRNAs origins across the genome and found that chromosome 3 had the highest number of circRNA loci per million bases across its length (Fig. 6A). circRNAs often regulate the gene of their origin (Liang et al. 2019, Liu et al. 2021). To identify the category of genes that generate inherited circRNAs we performed a GO analysis at “biological processes” level for the genic inherited circRNAs. We found that the majority of the host genes were involved in the process of fertilization such as acrosome reaction, egg coat formation and zona pellucida interaction with sperm (Fig. 6B).
Inherited circRNAs originate from all chromosomes. (A) Density-wise distribution of inherited circRNAs across chromosome length. Overall density of inherited circRNAs is shown in the background (grey) and the number of inherited circRNAs is shown in the foreground (red). Chromosomes are arranged in order 1–25 from left to right. (B) GO analysis of parent genes of inherited circRNAs. (C) Location wise distribution of inherited circRNAs across chromosomes. The 25 chromosomes of zebrafish are scaled to the same length with zero representing the start of the chromosome and 1000 representing the end of the chromosome. The position of each inherited circRNA in the gnome is represented by dots (red).
To identify circRNA hotspots in the zebrafish genome, we checked expression of inherited circRNAs in zygote according to their position in the genome. The inherited circular RNAs are distributed across all chromosomes in zebrafish, with the maximum number (34) arising from chromosome 3, as seen in Fig. 6A. Chromosomes 1, 5, 10, and 24 exhibit similar expression and distribution patterns across the length of chromosomes. A number of inherited circRNAs appear to be located near the ends of chromosomes, however, none of these were within the telomeric region (Fig. 6C).
3.8 Inherited circRNAs display stable expression till ZGA
The expression of circRNAs does not always correlate directly with the expression of their linear counterparts, as the back-splicing mechanism can be regulated separately from canonical splicing. We compared the expressions of inherited circRNAs and their linear counterparts in zygote. Overall, the inherited circRNA exhibit lower expression levels than their linear cognates (Fig. 7A), consistent with the notion that circRNAs are expressed at low levels. Although a subset of these circRNAs is expressed at levels comparable to their linear counterparts, five of the inherited circRNAs show higher expression levels, with four of them originating from the same host gene (Supplementary File S10, available as supplementary data at Bioinformatics Advances online).
Inherited circRNA expression are mostly stable post-fertilization till ZGA. (A) Violin plot showing the expression of all genic inherited circRNAs (231) and their corresponding mRNAs in the zygote. (B) Scatter plot showing the expression levels of 270 inherited circRNAs in the sperm and oocytes. (C) Heatmap shows the expression dynamics of inherited circRNAs post-fertilization. Each row is an individual circRNA expression. Two clusters are marked with red and pink blocks on the right, these show dynamic expression pattern during the early stages of development.
Maternal mRNAs are known to be crucial for early development and are traditionally considered as the major contributor of RNAs prior to the zygotic genome activation (ZGA). To explore the parental origin of inherited circRNAs, we plotted the expression of the 270 inherited circRNA in sperm and oocytes. Unlike inherited lncRNAs, where >95% of them are present in both sperm and oocytes, only 45% of inherited circRNAs are present in both gametes and likely to be inherited from both parents. Half (135/270) of inherited circRNAs are exclusively expressed in oocytes and only 5% are exclusively sperm derived (Fig. 7B; Supplementary File S11, available as supplementary data at Bioinformatics Advances online).
circRNAs are believed to be more stable and resistant to degradation due to their covalently closed loop structure. The absence of free 5′ and 3′ ends may protect them from exonucleases. To explore whether inherited circRNAs survive the global RNA degradation preceding zygotic genome activation, we plotted the temporal expression profiles of different clusters of circRNAs during the early developmental stages (0 hpf, 1 hpf, 2 hpf, 3 hpf, and 4 hpf) (Fig. 7C; Supplementary File S12, available as supplementary data at Bioinformatics Advances online). We found that the majority of inherited circRNAs (∼70%) remained stable through the early developmental stages whether they were low or high expressors. We also found two interesting classes of inherited circRNA. The first group of inherited circRNAs were abundant at 0 hpf but expression dropped dramatically at 1 hpf and remained low till 4 hpf suggesting programmed degradation immediately after fertilization (Fig. 7C, red box). A second class of inherited circRNAs was expressed at very low levels at 0 hpf but the expression levels were elevated at 1 hpf and remained steady until 4 hpf (Fig. 7C, pink box). Thus, we observed stable expression in most inherited circRNAs but a small group showed quite dynamic expression patterns in early development (Fig. 7C).
4 Discussion
In this study, we analysed the inherited lncRNA and circRNA repertoire of zebrafish that consists of more than 2000 lncRNAs and more than 200 circRNAs representing ∼20% of all known lncRNAs and 7% of circRNAs in zebrafish. The actual number of inherited long regulatory RNAs could be even higher. In addition to zebrafish, several studies have profiled the gamete and early embryos for expression of lncRNAs in diverse species. For instance, a study identified more >3000 lncRNAs expressed in bovine ovarian follicles, oocytes, and early embryos (Wang et al. 2023). Another study identified 1600 lncRNAs in oocytes and early embryos in mice (Karlic et al. 2017). Many of these early expressed lncRNAs are likely to be inherited as the early embryos across species are transcriptionally inactive. However, their functional significance remains largely unknown, highlighting the need for further investigation. In this study, the inherited lncRNA were identified by meta-analysis of published RNA sequencing datasets. This meta-analysis focused on consensus between multiple studies to shortlist only the high confidence inherited lncRNAs. The criteria that a lncRNA must be detected in multiple zygote and gametes RNA sequencing might have led to exclusion of some bonafide inherited lncRNAs. circRNAs, on the other hand, are rarer than lncRNAs, which themselves are not as abundant as mRNAs. Due to the lack of comprehensive annotation of circRNAs in zebrafish currently and the very low expression levels in the samples we used for analysis, we had to relax our criteria to include circRNAs that were expressed in at least one sample of sperm, oocyte, or zygote. We have also used this less stringent criteria to annotate a circRNA as inherited. With improvements in genome annotation and RNA sequencing experiments at greater depth in future, we anticipate an even higher number of inherited long non-coding RNAs than currently shown in this study.
We find that the lncRNAs have a broad expression range in the zygote, with a minority of lncRNAs showing even higher expression than the median mRNA expression. Surprisingly, the sperm has similar diversity and expression of inherited lncRNAs as compared to the oocytes and lncRNA inheritance from sperm could be a novel theme in inheritance. However, an important point to ponder here is that the RNA sequencing libraries of sperm analysed in this study are derived from thousands of spermatozoa and each individual spermatozoa may contain only a few copies of these lncRNAs. Whether this small number of copies of an lncRNA has a functional relevance or not remains to be seen.
We found more than 250 zebrafish inherited lncRNAs that have a conserved counterpart in the pool of both mouse and human inherited lncRNAs suggesting conserved functions for these inherited RNAs. We explored the functional relevance of inherited lncRNAs. Counterparts of these lncRNAs exist in many adult tissues at high levels. Thus, it does not seem that these lncRNAs are byproducts of oogenesis, inadvertently carried over during fertilization. An intellectually appealing, parsimonious hypothesis is that inherited RNAs may simply be a readily degradable source of nucleotides for the embryo. We looked at the post-fertilization expression pattern of inherited lncRNAs and circRNAs and found that most of these RNAs persist to 4 hpf, past the catastrophic RNA degradation and clearance of MZT. It appears that inherited lncRNAs and circRNAs are not merely a source of nucleotides or carry over of gametogenesis and they could have important RNA related functions in the zygote. A recent study has identified hundreds of sperm and oocyte expressed lncRNAs in mice and humans (Subhash et al. 2020). The study also found stable post-fertilization expression of inherited lncRNAs, particularly those expressed in both the gametes. This finding aligns with our observation in zebrafish regarding the post-fertilization stability of inherited lncRNAs.
While inherited circRNAs exhibit a broad range of expression in the zygote, their expression levels are much lower than their linear counterparts, consistent with the general understanding that circRNAs are typically expressed at lower levels. A large subset of these circRNAs was predominantly expressed in oocytes, indicating their maternal origin. In the inherited circRNAs, we found two classes of circRNAs that showed interesting expression patterns. A group of circRNAs appeared to be inherited in 0 hpf but were degraded rapidly by 1 hpf. This hints at a group of inherited circRNAs with functional relevance during fertilization. The second group was also interesting. From very low or no expression at 0 hpf these circRNAs appeared to be upregulated in 1 hpf. Since it is known that no new transcription occurs in the zygote until the ZGA, around 4 hpf, this would suggest that there is active splicing happening in a small percentage of inherited linear RNA that might be giving rise to these circRNAs.
Modifications on the genomic DNA and histones are considered the major forms of epigenetic transfer of information from one generation to the next. Inheritance of non-coding regulatory RNAs, some of which have a long half-life, present an alternative pathway for transferring information from the parental generation to the offspring. Here, we provide a carefully annotated resource of inherited lncRNAs and circRNAs, along with information about their conservation and tissue specific expression. One of the surprising findings was the concordance between the inherited RNA pool in the male and female gametes. While maternal inheritance has been studied in a variety of organisms, the role of sperm-derived RNA has been restricted to the recent reports of tRNA derived small RNA. Recent studies have reported altered sperm lncRNA profiles in response to paternal stress and high fat diet in mice. These studies hint toward a contribution of paternally inherited lncRNAs in stress induced behavioral changes (Hoffmann et al. 2024) and high fat diet induced obesity (An et al. 2017) in the progeny. Furthermore, a study showed an important role of a paternally inherited circRNA named circRNA-1572 in zygotic genome activation of Bos taurus embryos (Wu et al. 2025). This circRNA was found to be a sponge of a microRNA, bta-miR-2478-L-2 and its knockdown led to impaired zygotic genome activation. This study highlights the significance of studying inherited circRNAs and possibilities of discovering novel regulatory pathways during early embryogenesis. In future, the functional studies on inherited lncRNAs and circRNAs can provide novel insights into various fields such as embryonic development, fertility, and inheritance of acquired traits.
Supplementary Material
vbaf139_Supplementary_Data
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1An T , Zhang T, Teng F et al Long non-coding RN As could act as vectors for paternal heredity of high fat diet-induced obesity. Oncotarget 2017;8:47876–89.28599310 10.18632/oncotarget.18138 PMC 5564612 · doi ↗ · pubmed ↗
- 2Bolger AM , Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 2014;30:2114–20.24695404 10.1093/bioinformatics/btu 170PMC 4103590 · doi ↗ · pubmed ↗
- 3Cao Z , Gao D, Xu T et al Circular RNA profiling in the oocyte and cumulus cells reveals that circ ARMC 4 is essential for porcine oocyte maturation. Aging (Albany NY) 2019;11:8015–34.31562810 10.18632/aging.102315 PMC 6781969 · doi ↗ · pubmed ↗
- 4Champroux A , Tang Y, Dickson DA et al Transmission of reduced levels of Mi R-34/449 from sperm to preimplantation embryos is a key step in the transgenerational epigenetic inheritance of the effects of paternal chronic social instability stress. Epigenetics 2024;19:2346694.38739481 10.1080/15592294.2024.2346694 PMC 11093028 · doi ↗ · pubmed ↗
- 5Chen Q , Yan M, Cao Z et al Sperm Ts RN As contribute to intergenerational inheritance of an acquired metabolic disorder. Science 2016;351:397–400.26721680 10.1126/science.aad 7977 · doi ↗ · pubmed ↗
- 6Conine CC , Sun F, Song L et al Small RN As gained during epididymal transit of sperm are essential for embryonic development in mice. Dev Cell 2018;46:470–80.e 3.30057276 10.1016/j.devcel.2018.06.024PMC 6103825 · doi ↗ · pubmed ↗
- 7Conine CC , Sun F, Song L et al Micro RN As absent in caput sperm are required for normal embryonic development. Dev Cell 2019;50:7–8.31265813 10.1016/j.devcel.2019.06.007 · doi ↗ · pubmed ↗
- 8Corral-Vazquez C , Blanco J, Aiese Cigliano R et al The RNA content of human sperm reflects prior events in spermatogenesis and potential post-fertilization effects. Mol Hum Reprod 2021;27. 10.1093/molehr/gaab 03533950245 · doi ↗ · pubmed ↗
