A Multi-Tissue Yak (Bos grunniens) ceRNA Atlas with Ribo-Seq–Informed lncRNA Curation and Candidate Prioritization
Zhenlin Zhu, Biao Li, Mingfeng Jiang

TL;DR
This study maps tissue-specific RNA regulatory networks in yaks, using Ribo-seq to improve accuracy and identify how gene regulation supports adaptation to high-altitude stress.
Contribution
A novel Ribo-seq-informed workflow reduces false positives in ceRNA networks and provides a curated multi-tissue yak ceRNA atlas.
Findings
Tissue-specific ceRNA networks show strong variation in connectivity, with testis having the highest.
Ribo-seq filtering reduced false positives by removing lncRNA candidates with ribosome-occupancy signals.
miRNA usage patterns were more consistent across tissues than ceRNA connections.
Abstract
Yaks live at high altitude, where low oxygen and cold temperatures place long-term stress on multiple organs. How different tissues coordinate gene regulation to support this adaptation is still not fully understood. Here, we analyzed RNA sequencing and microRNA sequencing data from several yak tissues (pooled from three animals per tissue) to build tissue-specific regulatory networks based on the competing endogenous RNA (ceRNA) idea, in which RNAs can influence each other by sharing microRNA binding sites. A major challenge in ceRNA analysis is that some transcripts labeled as “noncoding” by computational prediction may still show evidence of translation, which can create false links in network inference. To reduce this risk, we used ribosome profiling (Ribo-seq) as an additional quality-control step and removed lncRNA candidates with clear ribosome-occupancy signals before…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5- —Key Research and Development Project of Naqu City Science and Technology Bureau
- —Southwest Minzu University Gongga Project
- —Natural Science Foundation of Sichuan Province
- —Fundamental Research Funds for the Central Universities of Southwest Minzu University
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCancer-related molecular mechanisms research · MicroRNA in disease regulation · High Altitude and Hypoxia
1. Introduction
The yak (Bos grunniens), a keystone livestock species on the Qinghai–Tibet Plateau, has evolved integrated physiological strategies to withstand chronic hypobaric hypoxia, cold stress, and high ultraviolet radiation while maintaining production and reproduction under pastoral conditions [1,2]. At the organ level, plateau adaptation is expressed through coordinated remodeling across multiple systems. The lung supports efficient gas exchange and hypoxia-responsive pulmonary regulation [3,4]; the liver contributes via central metabolic reprogramming that sustains energy homeostasis [5]; the spleen helps maintain immune competence and blood/oxygen-related functions [6]; skeletal muscle, as a major site of oxygen consumption, undergoes hypoxia-associated metabolic and structural adjustment [3]; the testis preserves spermatogenesis and the integrity of the blood–testis barrier under hypoxic pressure [7,8]; and the small intestine supports nutrient acquisition, epithelial barrier maintenance, and metabolic adaptation [9]. Most prior studies have emphasized genomic selection signals or candidate genes; however, this approach is limited in elucidating how multiple organs coordinate their functions under shared environmental pressures to maintain systemic homeostasis [10,11]. Post-transcriptional regulatory networks, especially those mediated by small RNAs, can provide a more proximate readout of tissue states and inter-organ molecular crosstalk, offering actionable clues for multi-organ coordination in plateau adaptation [11].
The competing endogenous RNA (ceRNA) hypothesis proposes that transcripts (mRNAs, lncRNAs, circRNAs, and others) can communicate by competing for shared microRNA response elements, thereby modulating microRNA-mediated repression at the network level [12,13]. Because lncRNAs participate broadly in gene regulation, including cytoplasmic post-transcriptional control, ceRNA network analysis has become a practical framework for multi-omics integration across tissues [14,15]. Nevertheless, ceRNA inference is sensitive to quantitative constraints (binding-site functionality, abundance/stoichiometry, and spatiotemporal co-expression), and violations of these assumptions can inflate false-positive edges and hubs—an issue amplified in multi-tissue studies with limited biological replication [16]. A particularly underappreciated driver of such false positives is node-set contamination: ribosome profiling has revealed widespread ribosome occupancy on transcripts annotated as noncoding, and many lncRNA loci encode translatable small ORFs and functional micropeptides [17,18]. Therefore, restricting the noncoding node set using orthogonal translation evidence is critical for improving the interpretability and reproducibility of ceRNA networks in complex tissue contexts.
In multi-tissue ceRNA studies, minimizing coding contamination in the noncoding node set is essential for improving network interpretability and reproducibility, particularly when biological replication is limited. To this end, we integrated RNA-seq/miRNA-seq with ribosome profiling (Ribo-seq), which provides a quantitative, genome-wide snapshot of ongoing translation and complements transcript abundance by resolving the translatome [19,20]. We used Ribo-seq evidence to screen candidate yak lncRNAs for ribosome occupancy prior to ceRNA inference; transcripts with detectable ribosome association were excluded from the noncoding node set, while archived as candidates for future micropeptide-focused validation. Although ribosome association does not invariably imply a stable protein product, this orthogonal quality-control layer reduces coding contamination and helps mitigate false positives in cross-tissue ceRNA network analysis [17,18].
Despite increasing genomic resources for yak, a cross-tissue, post-transcriptional regulatory atlas remains scarce, and ceRNA analyses in non-model livestock are often challenged by limited replication and uncertainty in noncoding annotations. Integrating RNA-seq/miRNA-seq with ribosome profiling helps bridge this gap by complementing transcript abundance with a genome-wide readout of ribosome occupancy, thereby reducing coding contamination in the noncoding node set and improving interpretability of inferred networks. Here, we provide a unified multi-tissue resource of lncRNAs/circRNAs/miRNAs and a reproducible, Ribo-seq–informed workflow to prioritize candidate regulators for downstream functional validation in yak high-altitude adaptation.
2. Materials and Methods
2.1. Sample Collection
All experimental procedures were approved by the Institutional Animal Care and Use Committee of Southwest Minzu University and were conducted in accordance with relevant national and institutional guidelines for animal care and use.
Tissue samples (spleen, lung, liver, testis, skeletal muscle, and small intestine) were collected from three Maiwa yaks at the Yak Original Breeding Farm in Hongyuan County, Aba Tibetan and Qiang Autonomous Prefecture, Sichuan Province, China. All animals were maintained under routine farm management with veterinary oversight, with ad libitum access to water and feed and shelter appropriate for local climatic conditions. No experimental treatments were applied prior to sampling. Tissue collection was performed by trained personnel, and all efforts were made to minimize distress during handling and sampling. Samples were collected immediately after routine slaughter and were snap-frozen in liquid nitrogen and stored at −80 °C until further processing. For each tissue, equal amounts of material from the three individuals were pooled to generate a composite sample for nucleic acid sequencing and ribosome profiling [21], primarily to reduce library preparation and sequencing costs under budget constraints. Ribosome profiling libraries were successfully obtained for all tissues except skeletal muscle, for which no detectable ribosome-protected footprints were recovered.
To ensure statistical robustness and exclude intestinal tissues, we retrieved high-throughput sequencing data from the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov/sra (accessed on 25 August 2024)). These data originated from five tissues—testis, lung, spleen, liver, and muscle—and comprised paired RNA-seq and miRNA-seq datasets with three biological replicates per tissue. Detailed sample information and SRA accession numbers are provided in Supplementary Table S1.
2.2. Preparation of RNA Sequencing Libraries
Total RNA served as the input material for RNA sample preparation. Sequencing libraries were generated using the NEBNext^®^ Ultra Directional RNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA, USA Catalogue #: E7420S) following the manufacturer’s recommendations, and index codes were added to assign sequences to each sample. Briefly, mRNA was purified from total RNA using probes to remove rRNA. Fragmentation was performed with divalent cations at elevated temperature in First Strand Synthesis Reaction Buffer (5X). First-strand cDNA was synthesized with random hexamer primers and M-MuLV Reverse Transcriptase (RNase H). Second-strand cDNA synthesis was then carried out using DNA Polymerase I and RNase H. Remaining overhangs were converted to blunt ends by exonuclease/polymerase activities. After adenylation of the 3′ ends of DNA fragments, NEBNext adaptors with a hairpin loop structure were ligated to prepare the fragments for hybridization. To select cDNA fragments of approximately 370–420 bp, the library fragments were purified with the AMPure XP system (Beckman Coulter, Brea, CA, USA). Size-selected, adaptor-ligated cDNA was treated with 3 µL USER Enzyme (New England Biolabs, Ipswich, MA, USA) at 37 °C for 15 min, followed by incubation at 95 °C for 5 min. The resulting cDNA was then amplified using Phusion High-Fidelity DNA Polymerase with Universal PCR primers and an Index (X) Primer. Finally, PCR products were purified with the AMPure XP system, library quality was assessed on the Agilent 5400 system (Agilent Technologies, Santa Clara, CA, USA), and libraries were measured by qPCR (1.5 nM). Qualified libraries were pooled and sequenced on Illumina platforms using a PE150 strategy at Novogene Bioinformatics Technology Co., Ltd. (Beijing, China), according to the required effective library concentration and data output. Qualified libraries were pooled and sequenced on Illumina platforms using a PE150 strategy at Novogene Bioinformatics Technology Co., Ltd. (Beijing, China), according to the required effective library concentration and data output.
2.3. Preparation of Small RNA Sequencing Libraries
Total RNA served as input material for RNA sample preparation. Sequencing libraries were generated using the NEBNext^®^ Multiplex Small RNA Library Prep Set for Illumina^®^ (Set 1) (New England Biolabs, Ipswich, MA, USA, Catalog #: E7300S). Briefly, 3′ and 5′ adapters were ligated to the 3′ and 5′ ends of the small RNAs, respectively. First-strand cDNA was synthesized following hybridization with the reverse transcription primer. Double-stranded cDNA libraries were produced by PCR enrichment. After purification and size selection, libraries with inserts between 18 and 40 bp were prepared for sequencing on an Illumina platform using the SE50 strategy. Library quality was assessed on an Agilent 5400 system (Agilent, USA) and measured by qPCR (1.5 nM). Qualified libraries were pooled and sequenced on Illumina platforms with the SE50 strategy at Novogene Bioinformatics Technology Co., Ltd. (Beijing, China), according to the effective library concentration and the required data amount.
2.4. Preparation of Ribosome Profiling Libraries
To digest RNAs other than RPFs, cell or tissue lysates were treated with the nonspecific endoribonuclease RNase I. Monosomes were isolated by size-exclusion chromatography using MicroSpin S-400 HR columns. The RNA samples were then treated with an rRNA-depletion kit to remove as much rRNA contamination as possible before PAGE purification of the relatively short (20–38 nt) RPFs. Following PAGE purification, both ends of the RPFs were phosphorylated and ligated to 5′ and 3′ adapters, respectively. The fragments were then reverse-transcribed into cDNAs and amplified by PCR [22,23]. After library construction, library concentration was measured with a Qubit^®^ 2.0 Fluorometer and adjusted to 1 ng/µL. An Agilent 2100 Bioanalyzer was used to assess insert size. Finally, the accurate cDNA library concentration was measured again by qPCR. Once the insert size and concentration matched the required criteria, the samples were submitted for sequencing.
2.5. Quality Control
All sequencing data (both in-house and obtained from NCBI) undergo quality assessment with FastQC (v0.12.1) [24] and are subjected to adapter trimming and quality filtering using Cutadapt (v5.2) [25]. Stringent quality-control procedures ensure high-quality clean reads for downstream analyses. For RNA-seq, stringent filtering criteria were applied to ensure data integrity. Read pairs were removed when either mate exhibited adapter contamination, contained more than 10% ambiguous nucleotides, or harbored over 50% bases with Phred quality scores below 5. For miRNA-seq, high-quality reads were retained by excluding sequences with poly N, those carrying 5′ adapter contamination, lacking the 3′ adapter or insert sequence, displaying poly A/T/G/C features, or exhibiting generally poor base quality; only reads within the appropriate length distribution were used for subsequent analyses. For Ribo-seq, clean reads were obtained by eliminating sequences with 5′ adapters, those missing 3′ adapters or insert sequences, those containing more than 10% ambiguous bases, or those with over 50% nucleotides with Phred scores ≤ 5, followed by removal of 3′ end adapter remnants. All downstream analyses were conducted exclusively on rigorously filtered, high-quality reads.
2.6. Read Mapping and Transcript Assembly
Clean reads were aligned to the yak reference genome assembly Maiwa_STV1.0, available from the Genome Warehouse (GWH), National Genomics Data Center (NGDC) under accession GWHGRYZ00000000.1, using HISAT2 (v2.2.1) [26]. Known splice junctions from the reference annotation were used to construct the index. The aligner was configured in strand-specific mode according to the library preparation strategy, and only the unique reads were retained for subsequent assembly. After sorted and deduplicated BAM files were produced, transcript assembly and expression quantification were performed for each sample individually in reference-guided mode using StringTie (v2.2.3) [27]. Assemblies from all samples were combined with StringTie’s merge function to produce a set of transcriptomes representing the whole genome. Finally, the merged transcriptome was compared and compared to the reference annotation using Gffcompare (v0.11) [28] to classify transcripts and generate a candidate pool for further identification.
Gene-level quantification of mRNA expression was conducted using aligned BAM files. We employed featureCounts (v2.1.1) [29], utilizing exons as counting features to perform fragment-level counting of paired reads. Exon counts were then aggregated to the gene level based on the Parent attribute in the GFF annotation. Only fragments with both ends reliably aligned were retained for further analysis. The resulting gene count matrix was normalized by total exon length and converted to transcripts per million (TPM) [30] to create an mRNA expression matrix that is comparable across samples and tissues.
2.7. Identification of lncRNAs
We obtained the merged transcript set by filtering them in the same manner as in previous studies. First, we compared the merged transcripts one by one with the reference annotation using Gffcompare (v0.11) [28]. Based on their genomic position relative to the annotation, we kept the transcriptions that are intergenic (u), intronic (i), antisense overlap (x), and other noncoding-like (antisense or distal overlaps) with protein-coding loci. We eliminated transcripts that are identical to or strongly correlated with known protein-coding loci on the same strand.
We then applied structural criteria to refine the candidate set. Transcripts shorter than 200 nt were removed, and we focused on multi-exon transcripts with at least two exons. Single-exon transcripts were retained only if they were long and did not overlap so closely with annotated exons. Technical artifacts and low-complexity fragments were reduced. Using genome annotation, we also excluded transcripts that overlapped annotated rRNAs, tRNAs, snoRNAs, snRNAs or miRNAs on the same strand to avoid contamination by small noncoding RNAs.
The remaining candidates are coding potential-assessed using four complementary tools, CPC2 [31], CNCI [32], PLEK [33], and Pfam-A [34], which measure sequence structure and ORF features. The predicted peptide sequences from the longest ORF are compared to Pfam-A using HMMER (v3.3.2) [35] to detect conserved protein domains. Transcripts consistently classified as non-coding by all four tools and without significant Pfam domain hits are high-confidence lncRNAs; all other transcripts are considered potentially coding or ambiguous and removed from the list.
After defining a high-confidence set of lncRNAs, we combined these new lncRNAs with annotated mRNAs and known lncRNAs from the reference genome to construct a single reference genome. We requantified expression in each sample (decoy-aware mode) using Salmon [36], conserving strand specificity parameters used in the HISAT2 alignments. From Salmon-derived TPMs and read counts, we generated an lncRNA expression matrix on both transcriptions and gene levels and checked the corresponding StringTie’s results. A normalized expression matrix was used for downstream tissue-specificity analysis and ceRNA network construction.
2.8. Identification of circRNAs
The identification of circRNAs relies mainly on genomic alignments and the detection of back-spliced junctions (BSJ). In this study, for each sample, we aligned paired-end clean reads to the yak reference genome using BWA-MEM (v0.7.17) [37]. We retained supplementary alignment information and SA tags to facilitate the subsequent detection of back-splicing.
Based on this approach, CIRI2 (v2.0.6) [38] scanned the alignment results to identify candidate circRNAs with canonical back-splice junction (BSJ) features and to determine the number of BSJ-supporting reads for each circRNA in each sample. Subsequently, custom scripts merged CIRI2 outputs from various samples using genomic coordinates, creating a raw BSJ count matrix for circRNAs across all samples. To minimize random noise and false positives, only circRNAs with BSJ support of at least 2 in two or more biological replicates were retained.
For expression quantification, we normalized the circRNA count matrix using edgeR (v3.42) [39]. We applied the TMM [40] method to adjust for differences in sequencing depth between samples and converted expression values to log2 counts per million (log2-CPM). This transformation facilitated inter-tissue comparisons and downstream analyses. To determine circRNA origin types, we conducted an overlap analysis of circRNA genomic coordinates with exon and intron intervals using bedtools (v2.30) [41]. This analysis, performed against a GTF annotation matching the reference genome version, allowed us to classify circRNAs as exonic, intronic, or intergenic. For tissue-specificity analyses, we calculated the τ (tau) index from the normalized expression matrix. CircRNAs with τ ≥ 0.85 were identified as having more concentrated, tissue-biased expression patterns.
2.9. Identification and Target Prediction of miRNA
To obtain clean reads, we used Cutadapt (v5.2) [25] to trim sequences to a length of 18–30 nt [42]. Duplicate sequences were then collapsed and counted using a custom Python (v3.9) script, streamlining subsequent analysis. After collapsing, we employed the mapper.pl script from miRDeep2 (v2.0.1.3) [43], which utilizes Bowtie (v1.3.1) [44], to align the sequences to the yak reference genome, resulting in alignment files in ARF format.
Building on the previous results, bovine mature and precursor miRNA sequences from miRbase [45] served as the reference species, while ovine mature miRNA sequences acted as a proximate reference. We employed miRDeep2 (v2.0.1.3) for miRNA prediction. Following the identification and quantification of known and novel miRNAs, we utilized custom Python scripts to create expression matrices for each sample. These matrices were then normalized using counts per million (CPM) [46] to enhance reliability.
miRNA target prediction was preliminarily screened using the classical 7mer-m8 [47] seed matching strategy. Specifically, nucleotides 2–8 of the mature miRNA were extracted as a 7-nt seed sequence, its reverse-complement sequence was computed, and exact match sites were searched for throughout the full-length sequences of candidate target RNAs. Transcripts containing such match sites were considered potential targets of the miRNA.
2.10. Ribo-Seq Reads Mapping and Quantification of Translation
For ribosomal Western blotting sequencing (Ribo-seq) data, we considered the preprocessed single-ended reads as ribosome-protected fragments (RPF) for alignment and translation quantitative analysis. To minimize the interference from rRNA and tRNA fragments during counting, we first constructed a reference index for rRNA and tRNA using HISAT2 (v2.2.1) [26]. The RPFs were initially aligned to this index. Reads that did not align were then aligned to the yak reference genome in non-splicing mode using HISAT2 (v2.2.1) [26]. We retained only uniquely aligned reads, which were subsequently sorted and indexed with SAMtools (v1.10) [48] to produce the genome alignment file for counting.
Translation activity quantification relies on read counts at the coding sequence (CDS) level. We employed featureCounts (v2.1.1) [29], designating CDS as the counting feature in the GFF annotation. Reads were merged with corresponding transcripts or genes using the Parent field. We set chain-specific parameters and alignment quality thresholds (-s 0 -Q 10) to ensure only high-confidence alignments were counted. After aggregating counts from multiple samples, we converted the original counts to reads per thousand bases (RPK) based on CDS length using a Python script. These were then standardized to transcripts per million (TPM) [30], creating a translation expression matrix comparable across different organizations. By integrating RNA-seq transcript expression data, we identified genes or transcripts with reliable read support in Ribo-seq as “with translation evidence” for further analysis.
2.11. Construction of ceRNA Network
The ceRNA network was constructed using predicted miRNA-target relationships and the count of shared miRNAs between transcripts. We applied a 7mer-m8 seed-matching strategy, extracting nucleotides 2–8 from each mature miRNA as a 7-nt seed sequence. The reverse complement of this seed was then calculated, and exact 7-nt matches to this reverse complement were identified within mRNA 3′UTR regions, as well as within lncRNA and circRNA sequences. These matches were termed miRNA response elements (MREs). Using these annotations, we aggregated the corresponding miRNAs for each transcript and calculated the number of shared miRNAs between any two RNAs. The network included specific RNA pairings: lncRNA–lncRNA, lncRNA–circRNA, lncRNA–mRNA, circRNA–circRNA, and circRNA–mRNA, while mRNA–mRNA pairings were excluded. An RNA pair sharing at least five miRNAs was considered a candidate ceRNA edge.
To confirm the noncoding nature of lncRNA nodes, we combined lncRNA sequences from various tissues, excluding muscle, while preserving tissue information in the sequence identifiers. This created a unified lncRNA transcript reference set, which we indexed using Bowtie2 (v2.4) [49]. We then extracted single-end reads from the Ribo-seq alignment results for each tissue and realigned them to this lncRNA index. Using Samtools (v1.10) [48], we converted, sorted, and indexed the alignment files, tallying the number of reads mapped to each lncRNA transcript. We calculated Reads Per Kilobase per Million mapped reads (RPKM) [50] values for each lncRNA, selecting those with RPKM < 1 [51] as high-confidence noncoding lncRNAs for further expression analyses and ceRNA network construction. Simultaneously, only miRNAs with CPM > 1 in more than two samples and mRNAs with TPM > 1 were retained for the construction of each tissue-specific network. Subsequently, we constructed tissue-specific ceRNA networks for the five non-muscle tissues. These networks used the filtered lncRNAs, circRNAs, and mRNAs as nodes, with RNA pairs meeting the shared-miRNA threshold serving as edges.
3. Results
3.1. Quality Control Results
All sequencing data underwent a comprehensive quality-control workflow. Raw reads from miRNA-seq, RNA-seq, and Ribo-seq were evaluated with FastQC (v0.12.1) and showed consistently high base-quality distributions, expected GC content, and no significant adapter contamination. After processing with Cutadapt (v5.2), miRNA-seq retained over 90% cleaned reads in the vast majority of samples, with length distributions concentrated between 18–30 nt, consistent with small-RNA characteristics. Following adapter removal and low-quality base filtering, RNA-seq and Ribo-seq clean reads remained at high levels and displayed stable Q30 proportions, meeting requirements for downstream gene-expression quantification and ribosome-footprint analyses. Overall, all sequencing datasets met high-quality standards and were suitable for subsequent alignment, quantification, and network-construction analyses.
3.2. The Prediction and Characteristic of LncRNAs
Using multi-tissue transcriptome assembly, we evaluated coding potential with CPC2, PLEK, CNCI, and Pfam, defining high-confidence lncRNAs based on consensus across these tools. We analyzed 17,214 candidate transcripts, with 9360 (54.4%) consistently identified as noncoding, forming a global consensus set (Figure 1A). When aggregated by tissue, the consensus lncRNA set included 10,037 transcripts, slightly more than the global intersection. This suggests that some lncRNAs appeared in multiple tissues or that tissue-specific criteria differed from merged assembly criteria (Supplementary Table S2). Biotype annotation revealed that most consensus lncRNAs were lincRNAs, followed by those of antisense and intronic origins (Figure 1B). Testes had the highest number of consensus lncRNAs, while other tissues had fewer, indicating strong tissue specificity (Figure 1C). The length distribution showed a long-tail pattern, suggesting that a few very long transcripts may raise the overall mean (Figure 1D,E).
3.3. Identification of circRNAs and Their Origin Types and Length Characteristics
Across six tissues, we identified 234 candidate circRNAs (Figure 2A). Tissue distribution showed the most circRNAs in lung (n = 74), followed by testis (n = 50) and muscle (n = 42); liver (n = 34) and spleen (n = 29) had intermediate counts, and small intestine contained only a few (n = 5), indicating pronounced tissue-specific differences in circRNA generation or detection (Figure 2A). Annotation of origin showed that circRNAs were predominantly derived from intergenic regions and introns, accounting for 55.6% (n = 130) and 37.6% (n = 88), respectively, whereas exon-derived circRNAs represented a minor fraction of 6.8% (n = 16) (Figure 2B). Tissue-level compositional analysis mirrored this pattern: intergenic- and intron-derived circRNAs remained dominant across tissues, and exon-derived circRNAs comprised only a small proportion (Figure 2C). Boxplots of length distribution indicated that circRNA lengths were broadly comparable across tissues but exhibited some dispersion and a few long outliers; because the small intestine sample size was limited, its length distribution was less stable and should be interpreted cautiously pending additional samples (Figure 2D). In summary, these results delineated the basic multi-tissue circRNA landscape in yak and provided candidate resources for subsequent analyses of circular RNA regulation related to tissue function.
3.4. Prediction of miRNAs
We identified miRNAs and surveyed their expression profiles using miRNA-seq data from six tissues. A nonredundant set of 1030 miRNAs was detected across all tissues. Tissue-level detection counts revealed pronounced differences in miRNA composition among tissues: the lung showed the highest number of detected miRNAs, whereas muscle showed relatively fewer (Figure 3A), indicating marked tissue dependence of miRNA expression profiles. To characterize representative expression patterns, we normalized and plotted the top 20 most highly expressed miRNAs across all samples. The results showed that conserved members such as the let-7 family, miR-143, miR-26a, and miR-21-5p were relatively stably expressed across multiple tissues, while other miRNAs displayed stronger tissue-specific preferences (Figure 3B,C). Consistently, inter-tissue expression correlation analysis indicated that the miRNA profiles of the small intestine, liver, lung, and testis were relatively similar, whereas muscle differed more substantially from the other tissues (Supplementary Figure S1A). Because correlation can be influenced by sample size and expression filtering, we used this trend primarily to support the conclusion of tissue dependence. Expression-breadth statistics further indicated that miRNAs expressed broadly across all six tissues formed the largest group (approximately 27%), although a subset of tissue-restricted miRNAs remained (Supplementary Figure S1B). Together, these results—from detection repertoire and representative expression patterns to tissue similarity and expression breadth—converge to support the tissue-specific nature of yak miRNAs and provide a foundation for subsequent dissection of miRNA-mediated ceRNA regulatory networks.
3.5. Construction of the ceRNA Network
After excluding muscle tissue, we constructed tissue-specific ceRNA networks for the liver, lungs, spleen, small intestine, and testicles. The network sizes varied significantly across these tissues. Notably, the testis exhibited a much higher number of ceRNA edges compared to other tissues, while the remaining tissues had networks of similar scale (Figure 4A). We combined the lncRNA used for network construction with Ribo-seq recomparison results for non-coding screening, yielding a set of high-confidence lncRNA nodes. Although the number of circRNA nodes was relatively small, they were consistently integrated into the network framework (Figure 4B,C). At the miRNA-target relationship level, miRNA-mRNA interactions formed the primary background, while miRNA-lncRNA and miRNA-circRNA interactions served as bridges for ceRNA connections between different transcript types (Figure 4D). Cross-tissue comparisons revealed that despite each tissue containing numerous ceRNA edges, shared ceRNA pairings across tissues were minimal and predominantly single-tissue-specific, indicating significant tissue dependence in the ceRNA interaction structure (Figure 4E,F).
To assess the evidence strength and composition of network edges, we conducted a boxed count of shared miRNAs for each ceRNA edge. The findings indicated that ceRNA edges meeting the threshold predominantly fell within a medium range of shared miRNAs. In contrast, pairings with a higher number of shared miRNAs constituted a smaller proportion (Supplementary Figure S2A), aligning with our adopted shared miRNA threshold strategy. Additionally, the ceRNA edge types across different tissues were generally consistent. The network was primarily composed of lncRNA-mRNA pairings, while lncRNA-lncRNA and circRNA-involving pairings were less common (Supplementary Figure S2B).
At the miRNA level, the network load exhibits a “head concentration” pattern: a small number of miRNAs connect numerous transcript pairs across various tissues, whereas most miRNAs form only a few connections (Supplementary Figure S3A). An inter-tissue similarity analysis of miRNA loading spectra revealed moderate consistency among the five tissues, yet distinct tissue-specific differences remain (Supplementary Figure S3B). This finding aligns with the general observation that while ceRNA edges are highly tissue-specific, core miRNAs are shared to some extent.
Analysis of the node set’s tissue differences reveals that mRNA is extensively shared across the five tissues, whereas the overlap of screened lncRNA and circRNA between tissues is minimal (Supplementary Figure S4). This indicates that the tissue specificity of the ceRNA network is influenced by variations in non-coding transcript collections and the specific miRNA-target combinations within each tissue.
To clearly present the core interaction structure while maintaining readability, we visually sampled the complete ceRNA network, focusing on local hubs. Initially, we counted the number of miRNAs targeting each transcript type across the network, identifying those with the most miRNA response elements (MREs). From this, we selected the top 20 lncRNAs, top 10 circRNAs, and top 10 mRNAs with the densest connections. In the sub-network formed by these 40 hub targets and their adjacent edges, we recalculated miRNA connectivity and selected the top 20 miRNAs with the densest connections as display nodes. Thus, the miRNA collection in this figure highlights its role in the hub target subnetwork rather than its overall network degree. Consequently, miRNAs with higher degrees in the entire network but primarily connecting to non-hub nodes are not displayed, while those connecting more to hub targets are emphasized, even if their global degrees are moderate. Additionally, besides the testicular representative subnetwork shown in the main text (Figure 5), the highly connected ceRNA subnetworks of other tissues also exhibit a modular structure formed by several hub lncRNAs/target genes sharing miRNAs (Supplementary Figure S5), supporting the consistency of this topological feature across different tissues.
4. Discussion
This study integrates multi-tissue yak transcriptome and small RNA sequencing data to systematically construct tissue-specific ceRNA networks. A central design choice was to use ribosome-association evidence as an orthogonal quality-control layer: before network inference, we removed assembled lncRNA candidates with clear ribosome-occupancy signals and retained a higher-confidence noncoding lncRNA set for ceRNA analysis. This step addresses a practical but often underappreciated issue in ceRNA studies of non-model species—assembled transcripts that are predicted as “noncoding” by coding-potential tools may still carry translational signatures, and their inclusion can inflate shared-target connectivity and introduce annotation-driven false positives. In this context, our unified multi-tissue framework (Figure 1, Figure 2 and Figure 3) shows that divergence in lncRNA/circRNA abundance and composition is a major molecular substrate for network rewiring, consistent with accumulating evidence that spatiotemporal noncoding-RNA programs contribute to organ functional differentiation and shape post-transcriptional network plasticity [15,52]. Building on existing yak multi-tissue transcriptome resources that primarily catalog organ- and altitude-associated expression changes, our study extends the framework to a post-transcriptional, network-level view by explicitly modeling ceRNA topology across tissues. Notably, integrating ribosome profiling as a quality-control layer for lncRNA curation provides a practical safeguard against annotation-driven inflation of network connectivity in non-model livestock, complementing recent efforts to develop multi-tissue yak functional-genomics resources [11,53].
At the network level, we observed pronounced topological differences among tissues (Figure 4). ceRNA edges overlapped only marginally across organs, whereas miRNA hubs were comparatively more stable, together supporting a hierarchical organization of a conserved regulatory core with a tissue-specific periphery. This architecture is biologically plausible: miRNAs often act as broadly deployed post-transcriptional regulators, while lncRNAs and circRNAs—because of their stronger tissue restriction—tend to assemble local modules that tune regulation in a context-dependent manner [54]. This conserved miRNA layer with a tissue-specific noncoding periphery is consistent with quantitative and review literature emphasizing that ceRNA effects are highly dependent on miRNA/target stoichiometry, effective site functionality, and cellular co-localization; consequently, robust biological interpretation is often more defensible at the level of modules and hubs than at individual predicted edges [16,54,55]. The testis network (Figure 5) was particularly dense and highly connected, consistent with the known reliance on post-transcriptional control during germ cell development and gametogenesis and with reports of exceptional transcriptomic complexity and abundant noncoding RNAs in livestock testes [56,57,58]. Consistent with our observation, transcriptome-wide studies in cattle testis have reported extensive stage-dependent lncRNA programs and abundant circRNA expression during spermatogenesis, supporting the notion that noncoding RNAs contribute substantially to post-transcriptional regulation in the male germline [59,60]. Taken together, the topological contrast across tissues likely reflects distinct post-transcriptional strategies aligned with the physiological roles of reproduction, metabolism, and immune homeostasis.
Ribo-seq–guided filtering further revealed a subset of lncRNA candidates with pronounced ribosome occupancy across multiple tissues (Figure 4B and Figure S2), which were excluded from ceRNA network construction by design. Rather than being discarded, these transcripts form an independent pool of ribosome-associated candidates that merit targeted follow-up. Future evaluation should combine ORF features, trinucleotide periodicity, and proteomics support to determine whether any candidates produce stable micropeptides and to clarify their functional modes. This interpretation aligns with the expanding Ribo-seq and mass-spectrometry literature showing that transcripts annotated as noncoding can contribute to noncanonical translation products or translation-associated regulation under specific physiological conditions [20,61,62,63,64]. At the same time, ribosome association alone is insufficient to claim productive translation: the signal may reflect initiation, scanning, or other ribosome-regulatory events rather than stable protein output [65]. For this reason, we treated ribosome occupancy primarily as a filter to reduce potential coding contamination in the lncRNA node set, improving the consistency of ceRNA interpretation, while retaining the ribosome-associated list as a prioritized resource for downstream validation.
Regarding the small intestine, we emphasize that these samples are supplementary and limited in number; accordingly, any direct quantitative comparison or network inference involving this tissue should be interpreted cautiously. Although the small intestine is generally considered miRNA-rich, bta-miR-424-5p was not detected in our small-intestine dataset (Figure S3A). This negative result should not be interpreted as functional irrelevance in the digestive system. A more likely explanation is temporospatial restriction driven by cellular composition, developmental stage, or local microenvironment, as has been observed for multiple miRNAs across species during differentiation and development [54]. Under this view, bta-miR-424-5p may be transiently silenced or present at low abundance in the yak small intestine, whereas its major functional contexts may lie in reproduction, hepatic metabolism, and immune regulation.
Mechanistically, bta-miR-424-5p has been reported to modulate Activin signaling in bovine follicular granulosa cells by targeting SMAD7 and ACVR2A, affecting cell-cycle progression and apoptotic sensitivity [66]. In humans, miR-424 also interfaces with the TGF-β/SMAD axis and has been implicated in angiogenesis and hypoxia-responsive programs, suggesting a partially conserved regulatory logic across mammals [67]. In yak, stage-specific expression of miR-424-5p during testicular development has been described [68], and liver-associated miRNA studies link this miRNA to lipid metabolism and energy homeostasis [69]. More broadly, comparative miRNA-transcriptome analyses between yak and cattle have highlighted miRNA-mediated regulation as a component of high-altitude adaptation, supporting the value of interpreting tissue programs through miRNA-centered regulatory networks [4]. Our tissue-level observations are consistent with these reports: detectable expression in reproductive tissues and liver supports roles in reproductive and metabolic regulation, while signals in spleen and lung point to potential immune-homeostasis involvement, which may be particularly relevant under high-altitude hypoxic stress.
Methodologically, we adopted a shared-target strategy to infer ceRNA topology under limited sample sizes. Compared with correlation-heavy approaches, this strategy can more stably highlight highly connected nodes and tissue-specific modules without requiring large cohorts, and it reduces the susceptibility to noise amplification that can distort correlation estimates in small datasets. Conceptually, it compresses the interaction search space into a topology that is useful for prioritization rather than for edge-by-edge causal claims. This positioning is consistent with recent work emphasizing that ceRNA inference is constrained by miRNA–target stoichiometry, expression thresholds, and cellular co-localization; consequently, interpretation is most defensible at the level of modules and hubs rather than individual edges [16,70].
Taken together, our results position multi-tissue ceRNA mapping as a tractable, hypothesis-generating layer for yak plateau adaptation when interpreted with appropriate quantitative caution. Importantly, the Ribo-seq–informed lncRNA quality-control step provides a transferable template for reducing annotation-driven false positives in ceRNA studies of non-model livestock, while producing prioritized candidate sets for targeted functional validation [16,54]. Several limitations should be acknowledged. First, our in-house multi-tissue dataset used one pooled composite sample per tissue (equal amounts from three animals) primarily to reduce sequencing cost while enabling matched RNA-seq, miRNA-seq, and Ribo-seq from the same individuals. Although small-sample pooling can be cost-efficient in exploratory RNA-seq designs, it masks inter-individual variation and prevents formal estimation of biological variance; therefore, the inferred ceRNA topologies should be interpreted as hypothesis-generating rather than population-level regulatory estimates [21]. Second, ceRNA inference remains intrinsically prediction-driven and sensitive to quantitative constraints (miRNA–target abundance/stoichiometry, site functionality, and cellular co-localization). Even with the Ribo-seq-based lncRNA quality-control step that reduces coding contamination, false-positive edges and hubs may persist, particularly when miRNA binding is approximated by seed matches and when expression is measured in bulk tissues [16,54]. Third, our conclusions are currently supported by computational evidence rather than direct functional assays. Future work should validate prioritized candidates using additional independent animals and orthogonal experiments, including AGO2-RIP/CLIP to confirm miRNA–RNA binding, luciferase reporter assays to test candidate MREs, and perturbation/rescue experiments (e.g., miRNA mimic/inhibitor or ASO/CRISPRi knockdown of candidate lncRNAs/circRNAs) under hypoxia-relevant conditions. For ribosome-associated lncRNA candidates excluded from network construction, ORF-aware ribosome-profiling metrics (e.g., triplet periodicity) combined with targeted proteomics and ORF-disrupting mutagenesis will be needed to distinguish productive micropeptide translation from non-productive ribosome occupancy.
Several practical steps could strengthen this framework. Increasing replicates (especially for the small intestine) and harmonizing sampling conditions will improve cross-tissue comparability, while single-cell or spatial data can reduce tissue heterogeneity effects. AGO-RIP or CLIP assays would add direct binding evidence for key miRNA–RNA links. For ribosome-associated lncRNA candidates filtered out here, ORF-aware analyses with calibrated Ribo-seq metrics and targeted proteomics or ORF-disrupting mutagenesis will be needed to distinguish transient ribosome occupancy from productive micropeptide translation and to evaluate tissue-specific relevance.
5. Conclusions
In conclusion, we established a unified multi-tissue framework to catalog yak lncRNAs, circRNAs, and miRNAs and to construct tissue-specific ceRNA networks. By using Ribo-seq evidence as an orthogonal quality-control layer, we removed lncRNA candidates with clear ribosome-association signals and retained a high-confidence noncoding lncRNA subset for ceRNA inference, thereby reducing potential annotation-driven false positives. Cross-tissue comparisons indicate that ceRNA edges are strongly tissue-dependent with minimal overlap, whereas miRNA usage/load patterns are comparatively more conserved, consistent with a hierarchical organization featuring a conserved regulatory core and a tissue-specific periphery. The ribosome-associated transcripts excluded by filtering form an independent candidate pool for future evaluation of translational involvement, although ribosome association alone does not imply stable protein products. Limitations include the supplementary nature and limited sample size of the small-intestine dataset, the exclusion of muscle due to insufficient Ribo-seq support for consistent filtering, and the inherent constraints of prediction-based network inference. Overall, our study provides reusable multi-omics resources and an evidence-aware prioritization scheme that support downstream experimental validation of post-transcriptional regulation and translation-associated candidates in yak, with relevance to physiological adaptation to high altitude. Concretely, our evidence-aware prioritization ranks candidates by (i) passing the Ribo-seq-based noncoding quality control for ceRNA modeling (and, separately, retaining ribosome-associated lncRNA candidates as a translation-focused list), (ii) network-level support such as high miRNA-target load and hub/central positions within tissue-specific subnetworks, and (iii) physiological relevance of associated mRNA partners to hypoxia response, metabolism, immunity, or reproduction based on functional annotation. We propose a staged validation strategy: first confirming tissue expression in additional animals by qRT-PCR; then validating key miRNA–RNA interactions using AGO2-RIP/CLIP and MRE reporter assays; and finally testing regulatory effects via perturbation/rescue experiments (miRNA mimic/inhibitor; ASO/CRISPRi knockdown of candidate lncRNAs/circRNAs) in hypoxia-relevant models. For translation candidates, targeted proteomics and ORF-disrupting experiments will be used to verify micropeptide production and function.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Ayalew W. Chu M. Liang C. Wu X. Yan P. Adaptation mechanisms of yak (Bos grunniens) to high-altitude environmental stress Animals 202111234410.3390/ani 1108234434438801 PMC 8388626 · doi ↗ · pubmed ↗
- 2Naz S. Chatha A.M.M. Ullah Q. Farooq M. Jamil T. Muner R.D. Kiran A. Genomic adaptation, environmental challenges, and sustainable yak husbandry in high-altitude pastoral systems Vet. Sci.20251271410.3390/vetsci 1208071440872665 PMC 12390255 · doi ↗ · pubmed ↗
- 3Xin J.-W. Chai Z.-X. Zhang C.-F. Zhang Q. Zhu Y. Cao H.-W. Ji Q.-M. Zhong J.-C. Transcriptome profiles revealed the mechanisms underlying the adaptation of yak to high-altitude environments Sci. Rep.20199755810.1038/s 41598-019-43773-831101838 PMC 6525198 · doi ↗ · pubmed ↗
- 4Guan J. Long K. Ma J. Zhang J. He D. Jin L. Tang Q. Jiang A. Wang X. Hu Y. Comparative analysis of the micro RNA transcriptome between yak and cattle provides insight into high-altitude adaptation Peer J 20175 e 395910.7717/peerj.395929109913 PMC 5671665 · doi ↗ · pubmed ↗
- 5Ma J. Song N. Huang N. Rong Z. Li J. Wang S. Zhang X. Wei Q. Chen J. Effective energy harvesting of Yak by regulating the glucose and lipid metabolism in liver to adapt to the plateau environment Sci. Rep.20251694410.1038/s 41598-025-30526-z 41345509 PMC 12783150 · doi ↗ · pubmed ↗
- 6Zheng Y. Guan J. Wang L. Luo X. Zhang X. Comparative proteomic analysis of spleen reveals key immune-related proteins in the yak (Bos grunniens) at different growth stages Comp. Biochem. Physiol. Part D Genom. Proteom.20224210096810.1016/j.cbd.2022.10096835150973 · doi ↗ · pubmed ↗
- 7Ma R. Cui Y. Yu S.-J. Pan Y.-Y. He J.-f. Wang Y.-y. Zhao L. Bai X.-f. Yang S.-s. Whole transcriptome sequencing revealed the gene regulatory network of hypoxic response in yak Sertoli cells Sci. Rep.2024141990310.1038/s 41598-024-69458-539191828 PMC 11349989 · doi ↗ · pubmed ↗
- 8Ma R. Cui Y. Yu S.-J. Pan Y.-Y. He J.-f. Wang Y.-y. Wang J.-l. Wang X.-y. Bai X.-f. Zhang H. The glucose metabolism reprogramming of yak Sertoli cells under hypoxia is regulated by autophagy BMC Genom.20252638510.1186/s 12864-025-11497-x 40251498 PMC 12007286 · doi ↗ · pubmed ↗
