Transcriptomic dataset of two types of coelomocytes from Ophelia limacina (Rathke, 1843)
Daniil Khlebnikov, Polina Belova, Andrey Anisenko

TL;DR
This paper presents transcriptomic data from two types of coelomocytes in the annelid Ophelia limacina, revealing differences in gene expression between amoebocytes and rod-bearing cells.
Contribution
The study provides a de novo assembled transcriptome and identifies differentially expressed genes between two coelomocyte types in Ophelia limacina.
Findings
Transcriptomic data was collected from amoebocytes and rod-bearing coelomocytes in Ophelia limacina.
Differential expression analysis revealed significant gene expression differences between the two coelomocyte types.
A de novo transcriptome assembly was generated using Trinity for O. limacina coelomocytes.
Abstract
Coelomocytes are a group of specific cells found in the coeloms of invertebrate animals. They perform a variety of functions including immune response, storage, detoxification, and transport of different compounds. Ophelia limacina, a member of the phylum Annelida, has two morphologically distinct types of coelomocytes: one of which is amoebocytes, typical for annelids, and the other is rod-bearing cells containing an unusual intracellular structure – a central brown-colored rod. Here we present the transcriptomic data of both cell populations. For this purpose, a total of 148 individuals were collected in the White Sea. The coelomocytes were isolated, both populations of cells (amoebocytes and rod-bearing cells) were enriched using a manual method and then RNA was extracted for NGS-library preparation and further sequencing. Based on obtained data we de novo assembled a representative…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInvertebrate Immune Response Mechanisms · Marine Bivalve and Aquaculture Studies · Protist diversity and phylogeny
Specifications TableSubjectBiologySpecific subject areaZoology, transcriptomicsType of dataSequence reads obtained after Illumina sequencing of RNA samples, FASTQ files Analyzed data: differential expression analysis, Table, Images, GraphData collectionThe collection of Ophelia limacina was carried out at a depth of 0.5–1 m in June 2024. In total, 148 individuals were collected. The two populations of coelomocytes amoebocytes and rod-bearing cells were enriched and total RNA was extracted from them. This RNA was used for RNA sequencing.Data source locationThe material was collected at sublittoral of Podvolochie Inlet in White Sea near the White Sea Biological Station, Lomonosov Moscow State University 66^о^44’38.3’’N, 33^о^33’32.8’’E.Data accessibilityThe European Nucleotide Archive (ENA), Project: PRJEB102134 [https://www.ebi.ac.uk/ena/browser/view/PRJEB102134]Zenodo [https://doi.org/10.5281/zenodo.16418226]Related research articleNone
Value of the Data
2
- •These data are important for investigation of invertebrate coelomocyte functions, including immune, nutrient transport and storage, detoxification, etc.
- •The representative transcriptome of coelomocytes from Ophelia limacina can be used for comparative studies in invertebrate immunology.
- •The differential gene expression profiles of amoebocytes and rod-bearing cells obtained can be used for further evaluation of functional differences of these cell types.
Background
3
Coelomocytes are a group of cells found in coelomic fluids of different invertebrate animals including annelids. These cells perform a variety of functions among which immune, storage, detoxification and transportation of nutrients, oxygen etc. [1,2]. In different organisms these functions may be combined in one type of coelomocytes or specialization can occur, when a certain function is performed by a certain type of cells. The most studied one is immune function, that involves both direct neutralization of the pathogen through phagocytosis or encapsulation, and indirect action via the secretion of peptides, proteins, and other immunoreactive molecules [1,2].
O. limacina belonging to Opheliidae family (Fig. 1, A) contains two different types of coelomocytes, one is amoebocytes found in different polychaetes and another one is a rod-bearing cells [3] (Fig. 1, B). The last one contains unique central brown-colored rod and varies greatly in size from 10 to 350 µm. Functions of such type of cells remain unknown [3]. Here we for the first time described the transcriptomic data for two types of O. limacina coelomocytes and differentially expressed genes in both subpopulations of them. These data may shed some light on the functions of both types of coelomocytes.Figure 1. Transcriptomes of two types coelomocytes from Ophelia limacina. A – Appearance of O. limacina. B – Purity of amoebocytes (amb) and rod-bearing cells (rbc) populations used in study for further transcriptomic characterization. The amb samples contain over 95% of amoebocytes, as assessed by light microscopy. Rod-bearing cells comprised about 75% of the enriched rbc samples. C – BUSCO scores before and after decontamination of assembly. D – Principal component analysis (PCA) of transcriptome-wide normalized gene expression counts assessing the difference in amb and rbc cells. E – Differential transcript expression in amb and rbc cells. Amb-specific transcripts (log2FC < –1 and adj. p-value < 0.05) are highlighted in purple, while rbc-specific transcripts (log2FC > 1 and adj. p-value < 0.05) are displayed in green.Figure 1 dummy alt text
Data Description
4
To characterize transcriptomic differences between two types of Ophelia limacina coelomocytes –amoebocytes and rod-bearing cells – we developed a manual protocol for separating these populations (described in “EXPERIMENTAL DESIGN, MATERIALS AND METHODS” section). Upon implementation of this protocol, we obtained enriched populations of the cells: amoebocytes with a purity of 95% or greater and rod-bearing cells with about 75% purity assessed by light microscopy (Fig. 1, B). Total RNA from both populations of O. limacina coelomocytes was isolated in triplicate. NGS libraries were prepared from the polyadenylated fraction of RNA and sequenced to yield a total of 81.1 Gb reads. Obtained data were deposited at the European Nucleotide Archive (ENA), Project: PRJEB102134.
Trinity software [4] was used to perform de novo assembly of representative transcriptome of O. limacina coelomocytes. After decontamination and collapse of similar sequences, the final assembly with almost 350-fold coverage was obtained. The decontamination of obtained data led to a reduction of duplicated genes (Fig. 1, C). Decontaminated assembly deposited in Zenodo repository together with counts data doi.org/10.5281/zenodo.16418226.
The final transcriptome of O. limacina coelomocytes comprised 447,042 genes and 511,699 transcripts according to Trinity terminology [4]. Among them 29,196 transcripts corresponding to 28,872 genes were identified as protein coding with peptide lengths exceeding 100 amino acids. Notably, 40% of identified peptides had a BLASTP hit against the Uniprot/Swiss-Prot database (e-value threshold of 1e-5) and 36% peptides had at least one domain according to Pfam database. The remaining peptides, constituting 54% of the total number, exhibited no significant sequence homology to known proteins in the databases. Possibly, these proteins are lineage specific.
To quantitatively characterize the expression levels of transcripts in amoebocytes and rod-bearing cells, the reads were mapped to the assembled transcriptome. Additionally, we filtered out transcripts with expression levels below 1 TPM in at least two out of three replicates for each cell population. After processing, transcripts of 48,791 genes remained in the dataset. The total length of transcriptome, calculated by considering only one isoform with the highest expression level per gene in each cell type, was 26.37 Mb.
The cross-sample normalized values (TMM, in transcript per million units) after mapping reads are available at Zenodo repository doi.org/10.5281/zenodo.16418226. Principal component analysis (PCA) following regularized-logarithm transformation demonstrated that the samples exhibited clear clustering within the same cell type and each replicate was suitable for further analysis (Fig. 1, D).
Differential expression analysis was performed using the DESeq2 package [5] by calculating the log2(Fold change) values for transcript abundance and adjusted p-values. The volcano plot illustrates the differentially expressed genes between amoebocytes and rod-bearing cells, highlighting those with absolute log2FC exceeding 1 and adjusted p-value below 0.05 (Fig. 1, E). In amoebocytes, 2,843 up-regulated protein-coding genes were identified, among which 872 (30.67%) had BLAST or Pfam hits. Similarly, in rod-bearing cells, 1,538 up-regulated protein-coding genes were detected, including 795 (51.69%) with BLAST or Pfam hits.
Experimental Design, Materials and Methods
5
Collection of material
5.1
The 148 adult specimens of O. limacina were collected at sublittoral (0.5-1 m depth) of Podvolochie Inlet (66^о^44’38.3’’N, 33^о^33’32.8’’E) in White Sea near the White Sea Biological Station, Lomonosov Moscow State University, in June 2024. The individuals were maintained in a fresh sea water at 10-12°C for 1 week with daily replacement of water to the fresh one until cell isolation.
Coelomocytes isolation and RNA extraction
5.2
To isolate coelomocytes, O. limacina specimens were placed into 200 µL of 0.22 µm filtered sea water in the 24-well cell culture plate (1 specimen per well). The body wall was dissected with microsurgical scissors (MST, USA) to extract coelomic fluid containing the cells, after which the specimen was removed from the well. Massive contaminations of coelomic fluids with gametes or parasites such as gregarines were verified by light microscopy using Nikon Eclipse TS100 inverted microscope (Nikon, Japan). The 34 samples that did not pass these criteria were further excluded from the analysis. Amoebocyte and rod-bearing cell populations were enriched by manual method based on their different ability in adhesion to the surface of TC-treated plates (Fig. 1, B). Implementation of automatic assays for cell populations separation, e.g. FACS, was impossible due to giant size of rod-bearing cells up to 350 µm [3]. Amoebocytes settled down and attached to the well surface within 10 minutes, after which the coelomic fluid containing rod-bearing cells was collected manually using Pasteur pipete in 1.5 mL tubes under light microscopy control. Amoebocytes were washed with filtered sea water and lysed directly in wells by TRIzol reagent (Invitrogen, USA). The final purity of amoebocyte samples was greater than 95 % for each wells, which was checked using light microscopy. Cell lysates were pooled in three tubes so that each contained lysates from 38 wells. Rod bearing cells from the same wells also were pooled and then centrifuged at 100 g for 5 min, washed once with filtered sea water and lysed by TRIzol reagent (Invitrogen, USA). The content of rod-bearing cells in rbc samples was no less than 75 %. All the steps above were carried out at 4°C.
The RNA extraction from the samples was performed using TRIzol reagent according to the manufacturer’s protocol. To remove genomic DNA from the samples the extracted RNA was treated with DNase I (Thermo, USA) with followed DNase I removal by CleanRNA Standard kit (Evrogen, Russia). The integrity of RNA was analyzed with 1 % agarose gel electrophoresis in TAE buffer with ethidium bromide staining, and on-chip electrophoresis (Bioanalyzer 2100, Agilent). The RIN data are presented in Table 1.Table 1RNA-seq samples, read metrics, and public ENA sample accessions.Table 1 dummy alt textSample nameCell typeRINNumber of raw read pairs (M)Number of quality trimmed reads (M)Uniquely mapped reads,%ENA sample accessionamb1Amoebocytes8.342.842.787.2SAMEA120473375amb2Amoebocytes8.447.947.991.1SAMEA120473376amb3Amoebocytes8.342.242.184.6SAMEA120473377rbc1Rod-bearing cells8.451.751.689.8SAMEA120473378rbc2Rod-bearing cells8.440.040.091.9SAMEA120473379rbc3Rod-bearing cells8.345.945.996.1SAMEA120473380
Illumina library preparation and sequencing
5.3
The cDNA libraries were prepared from the polyadenylated RNA fraction using TruSeq2 chemistry according to manufacturer’s protocol. The average insert size for all samples was 300-350 bp. Library amplification was carried out for 12 cycles. The final concentration of the libraries ranged from 2.3 to 4.2 ng/µL. The obtained libraries were sequenced on an Illumina NovaSeq6000 instrument with 150 bp long paired-end reads and a yield of at least 40 million reads. Raw reads from the instrument were converted to FASTQ format with bcl2fastq2 software (Illumina). The obtained data are summarized in Table 1.
Transcriptome assembly and annotation
5.4
The process of de novo transcriptome assembly was carried out utilizing an adapted workflow originally detailed by E. Nikitenko et al. [6]. To ensure complete transparency, we provide a comprehensive overview of the entire methodology employed for processing raw sequencing data. The initial stage involved preprocessing of paired-end reads through a series of quality control steps. Error correction was initially performed using Rcorrector [7] version 1.0.6, targeting k-mer inaccuracies. Subsequently, read pairs containing uncorrectable errors or flagged k-mers were excluded from further analysis. Quality trimming was executed via TrimGalore version 0.6.10, employing specific parameters (“–length 36 -q 5 –stringency 1 -e 0.1”) to filter out sequences of suboptimal quality or insufficient length. Following quality control, reads were subjected to ribosomal RNA filtering using bowtie2 [8] version 2.5.4 against a rRNA database comprising SILVA 138.2 large and small subunit rRNA sequences (LSUParc and SSUParc). Only non-matching reads proceeded to the assembly stage. De novo assembly was conducted using Trinity [4] version 2.15.2 with standard parameters. Post-assembly processing included transcript collapsing with cd-hit-est [9] version 4.8.1, applying a 95% identity threshold to remove sequence polymorphisms. Contamination removal was performed using MCSC [10], focusing on Annelida phylum with a clustering parameter n = 4. The resulting clean assembly served as input for transcript quantification analyses. Open reading frame (ORF) prediction was carried out using TransDecoder v5.5.0, retaining sequences exceeding 100 amino acids. Functional annotation was performed through the Trinotate [11] v4.0.2 pipeline, incorporating: protein homology searches via diamond [12] v2.0.15 (BLASTP and BLASTX) against UniProtKB/SwissProt, domain analysis using hmmer [13] v3.3.2, functional annotation through eggNOG [14] v2.1.9 against the Pfam database. This multi-step approach ensured comprehensive processing and accurate annotation of the generated transcriptome dataset.
Identification and analysis of differentially expressed genes
5.5
Differential gene expression was conducted utilizing the DESeq2 [15] package after the raw data processing. The workflow comprised several sequential steps. First, the raw reads were mapped to the transcriptome assembly using bowtie2 v2.5.4. Subsequently, contig abundance estimation was performed employing salmon [8] v1.10.3. The resulting data were then processed using DESeq2. Visualization of results was accomplished using the ggplot2 package in R. Transcripts were deemed as differentially expressed if the expression level changed more than 2-fold (Log2FC > 1) between groups and the adjusted p-value was less than 0.05.
Quality control
5.6
FastQC was used to assess sequencing quality before and after adapter trimming and filtering for quality and length. The quality of the assembly was checked by mapping the source reads to the assembly as well as by BUSCO [16].
Limitations
In this study, we employed a manual separation protocol to isolate amoebocyte and rod-bearing cell populations of Ophelia limacina coelomocytes for subsequent transcriptomic analysis. The method utilized the distinct adhesion properties of these cell populations to TC-treated plates. Quality assessment demonstrated that the amoebocyte population achieved purity exceeding 95%, while the rod-bearing cell population showed only 75% enrichment. We draw researchers’ attention to the fact that the lower purity level of the rod-bearing cell population may introduce potential biases in the transcriptomic profiles.
Ethics Statement
The authors have read and follow the ethical requirements for publication in Data in Brief and confirming that the current work does not involve human subjects, animal experiments, or any data collected from social media platforms. Ophelia limacina collection did not require specific licensing, however, all the experimental procedures, including material collection, were approved by the Scientific Council of the White Sea Biological Station. All applicable international, national, and/or institutional guidelines for the care and use of animals were followed.
Credit Author Statement
Daniil Khlebnikov: Investigation, Data curation, Methodology, Formal analysis, Software. Polina Belova: Investigation, Methodology. Andrey Anisenko: Investigation, Validation, Supervision.
Funding
This work was supported by Interdisciplinary Scientific and Educational Schools of Moscow University No. 23-Ш04-29.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Sima P.Větvička V.Cooper E.Bilej M.Roch P.Annelid coelomocytes and hematocytes: roles in cellular immune reactions Immunology of Annelids 1993 CRC Press Boca Raton, Fla, USA 115165
- 2Tahseen Q.Coelomocytes: Biology and Possible Immune Functions in Invertebrates with Special Remarks on Nematodes Int. J. Zool 12009
- 3Belova P.Zhadan A.A novel type of polychaete coelomocytes found in Ophelia limacina (Rathke, 1843)Dokl Biol Sci 44020113132213482010.1134/S 0012496611050139 · doi ↗ · pubmed ↗
- 4Nikitenko E.Borisenko I.Anisenko A.Vortsepneva E.Transcriptomic sequence dataset of a potential new model species for studying biomineralization Onchidoris muricata (Nudibranchia, Gastropoda, Mollusca)Data Brief 11202411052610.1016/j.dib.2024.110526 PMC 1112716838799714 · doi ↗ · pubmed ↗
- 5Haas B.Papanicolaou A.Yassour M.Grabherr M.Blood P.Bowden J.Couger M.Eccles D.Li B.Lieber M.Mac Manes M.Ott M.Orvis J.Pochet N.Strozzi F.Weeks N.Westerman R.William T.Dewey C.Henschel R.Le Duc R.Friedman N.Regev A.De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis Nat Protoc 8201314942384596210.1038/nprot.2013.084PMC 3875132 · doi ↗ · pubmed ↗
- 6Love M.Huber W.Anders S.Moderated estimation of fold change and dispersion for RNA-seq data with DE Seq 2Genome Biol 1520145502551628110.1186/s 13059-014-0550-8PMC 4302049 · doi ↗ · pubmed ↗
- 7Song L.Florea L.Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads Gigascience 42015482650076710.1186/s 13742-015-0089-y PMC 4615873 · doi ↗ · pubmed ↗
- 8Langmead B.Trapnell C.Pop M.Salzberg S.Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Genome Biol 102009 R 251926117410.1186/gb-2009-10-3-r 25PMC 2690996 · doi ↗ · pubmed ↗
