The chromosomal genome sequence of the carnivorous sponge, Lycopodina hypogea (Vacelet & Boury-Esnault, 1996) (Poecilosclerida: Cladorhizidae) and its associated microbial metagenome sequences

Thierry Pérez; Jean Vacelet; Dirk Erpenbeck; Ute Hentschel; Graeme Oatley; Elizabeth Sinclair; Eerik Aunin; Noah Gettle; Camilla Santos; Michael Paulini; Haoyu Niu; Victoria McKenna; Rebecca O’Brien; Jayan Duminda M Senevirathna; Estelle Proux-Wéra; Poppy Hesketh-Best; Emily C Giles

PMC · DOI:10.12688/wellcomeopenres.25959.1·February 17, 2026

The chromosomal genome sequence of the carnivorous sponge, Lycopodina hypogea (Vacelet & Boury-Esnault, 1996) (Poecilosclerida: Cladorhizidae) and its associated microbial metagenome sequences

Thierry Pérez, Jean Vacelet, Dirk Erpenbeck, Ute Hentschel, Graeme Oatley, Elizabeth Sinclair, Eerik Aunin, Noah Gettle, Camilla Santos, Michael Paulini, Haoyu Niu, Victoria McKenna, Rebecca O’Brien, Jayan Duminda M Senevirathna, Estelle Proux-Wéra, Poppy Hesketh-Best

PDF

Open Access

TL;DR

This paper presents the genome sequence of a carnivorous sponge and its associated microbial community, revealing insights into its genetic makeup and symbiotic relationships.

Contribution

The study provides a high-quality chromosomal genome assembly and metagenomic analysis of Lycopodina hypogea and its microbial symbionts.

Findings

01

The sponge genome assembly includes 16,317 protein-coding genes and a 235.10 Mb chromosomal sequence.

02

The metagenome analysis identified 27 high-quality microbial genomes, including symbionts like Candidatus Spongiihabitans and Candidatus Poriferisodalaceae.

Abstract

We present a genome assembly from an individual Lycopodina hypogea (carnivorous sponge; Porifera; Demospongiae; Poecilosclerida; Cladorhizidae). The genome sequence has a total length of 235.10 megabases. Most of the assembly (98.85%) is scaffolded into 15 chromosomal pseudomolecules. The mitochondrial genome has also been assembled, with a length of 31.1 kilobases. Gene annotation of this assembly by Ensembl identified 16 317 protein-coding genes. From the metagenome data we recovered 39 bins, of which 27 were high-quality MAGs, including four fully circularised genomes. The MAGs included archaea and bacteria involved in nitrification and sulfate-reduction as well as known sponge symbionts affiliated with Gammaproteobacteria ( Candidatus Spongiihabitans, Porisulfidus) and Acidimicrobiales ( Candidatus Poriferisodalaceae), among others.

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species6

Lycopodina hypogea Candidatus Spongiihabitans Candidatus Poriferisodalaceae Porifera(sponges · phylum)Bacteria Latreille et al. 1825(Bacteria stick insect · genus)Candidatus Porisulfidus(clade)

Chemicals8

Nucleotide Arima TC ammonia formaldehyde sulfate Poly(A)Emily C

Figures7

Click any figure to enlarge with its caption.

Image of the Lycopodina hypogea (odLycHypo2) specimen used for genome sequencing (photograph by Thierry Pérez).

Frequency distribution of k-mers generated using GenomeScope2.The plot shows observed and modelled k-mer spectra, providing estimates of genome size, heterozygosity, and repeat content based on unassembled sequencing reads.

Hi-C contact map of the Lycopodina hypogea genome assembly.Assembled chromosomes are shown in order of size and labelled along the axes, with a megabase scale shown below. The plot was generated using PretextSnapshot.

Assembly metrics for odLycHypo2.1.The BlobToolKit snail plot provides an overview of assembly metrics and BUSCO gene completeness. The circumference represents the length of the whole genome sequence, and the main plot is divided into 1 000 bins around the circumference. The outermost blue tracks display the distribution of GC, AT, and N percentages across the bins. Scaffolds are arranged clockwise from longest to shortest and are depicted in dark grey. The longest scaffold is indicated by the red arc, and the deeper orange and pale orange arcs represent the N50 and N90 lengths. A light grey spiral at the centre shows the cumulative scaffold count on a logarithmic scale. A summary of complete, fragmented, duplicated, and missing BUSCO genes in the metazoa_odb10 set is presented at the top right. An interactive version of this figure can be accessed on the BlobToolKit viewer.

BlobToolKit GC-coverage plot for odLycHypo2.1.Blob plot showing sequence coverage (vertical axis) and GC content (horizontal axis). The circles represent scaffolds, with the size proportional to scaffold length and the colour representing phylum membership. The histograms along the axes display the total length of sequences distributed across different levels of coverage and GC content. An interactive version of this figure is available on the BlobToolKit viewer.

Blob plot of base coverage mapped against GC proportion for sequences in the Lycopodina hypogea metagenome.Binned contigs are coloured by family. Circles are sized in proportion to sequence length on a square-root scale, ranging from 510 to 6 879 066. Histograms show the distribution of sequence length sum along each axis. An interactive version of this figure may be viewed here.

Taxonomic tree based on taxonomic classifications of metagenome bins, constructed using ete3.Colours indicate phylum-level taxonomy. Tracks show genome completeness (blue), sequencing coverage (red, log 10), and genome size (grey bars, Mbp). High-quality MAGs are marked with grey circles; fully circularised MAGs in black.

Tables3

Table 1.. Specimen and sequencing data for BioProject PRJEB72483.

Platform	PacBio HiFi	Hi-C	RNA-seq
ToLID	odLycHypo2	odLycHypo4	odLycHypo8
Specimen ID	GHC0000171	GHC0000173	GHC0000177
BioSample (source individual)	SAMEA9463981	SAMEA9463983	SAMEA9463987
BioSample (tissue)	SAMEA9463992	SAMEA9463994	SAMEA9463998
Tissue	whole organism	whole organism	whole organism
Instrument	Revio	Illumina NovaSeq 6000	Illumina NovaSeq X
Run accessions	ERR12666653; ERR14209102	ERR12668765	ERR13669965
Read count total	17.91 million	828.98 million	102.06 million
Base count total	138.64 Gb	125.18 Gb	15.41 Gb

Table 2.. Genome assembly statistics.

Assembly name	odLycHypo2.1
Assembly accession	GCA_963969325.1
Alternate haplotype accession	GCA_963969335.1
Assembly level	chromosome
Span (Mb)	235.10
Number of chromosomes	15
Number of contigs	2 129
Contig N50	0.17 Mb
Number of scaffolds	199
Scaffold N50	16.02 Mb
Organelles	Mitochondrion: 31.1 kb
BUSCO (using the metazoa_odb10 reference)	C:69.0% [S:68.3%; D:0.6%]; F:11.6%; M:19.4%; n:954

Table 3.. Chromosomal pseudomolecules in the primary genome assembly of Lycopodina hypogea odLycHypo2.

INSDC accession	Molecule	Length (Mb)	GC%
OZ017779.1	1	21.07	39
OZ017780.1	2	16.99	39
OZ017781.1	3	16.60	39.50
OZ017782.1	4	16.47	39
OZ017783.1	5	16.37	39
OZ017784.1	6	16.29	39
OZ017785.1	7	16.02	38.50
OZ017786.1	8	15.84	38.50
OZ017787.1	9	15.31	39.50
OZ017788.1	10	14.45	38
OZ017789.1	11	14.39	38.50
OZ017790.1	12	13.51	38.50
OZ017791.1	13	13.26	39
OZ017792.1	14	12.94	38.50
OZ017793.1	15	12.89	39.50

Funding2

—Wellcome Trust
—Gordon and Betty Moore Foundation

Keywords

Lycopodina hypogea; carnivorous sponge; genome sequence; chromosomal; Poecilosclerida; microbial metagenome assembly

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarine Sponges and Natural Products · Genomics and Phylogenetic Studies · Microbial Natural Products and Biosynthesis

Full text

Species taxonomy

Eukaryota; Opisthokonta; Metazoa; Porifera; Demospongiae; Heteroscleromorpha; Poecilosclerida; Cladorhizidae; Lycopodina; Lycopodina hypogea ( Vacelet & Boury-Esnault, 1996) (NCBI:txid2491183)

Background

Lycopodina hypogea is one of the most iconic sponges described in the last 30 years ( Vacelet & Boury-Esnault, 1996). The discovery of this species redefined the concept of sponges, as this sponge has replaced its filter-feeding body plan with a carnivorous feeding habit. This marine sponge belongs to Cladorhizidae, a family of Porifera generally living in deep-sea environments and mostly sharing this unusual feeding strategy.

The genus Lycopodina includes the deepest known sponge ever found, L. occidentalis (Lambe, 1883), which was reported by Koltun (1970) at a depth of 8 840 m. L. hypogea has a Mediterranean and south European Atlantic shelf distribution, in bathyal environments and in dark littoral caves ( Chevaldonné et al., 2015). Thus, this sponge can be easily collected by scuba diving and maintained and cultured under laboratory conditions for years. Sub-populations of tens of specimens only require rather small aquaria, kept at a stable temperature (13–15 °C), with a monthly food supply ( Artemia nauplii) and a supply of fresh seawater. This model sponge and its very simple culture methods thus offer exceptional experimental conditions, which have already led to significant advances in our knowledge of sponges in this family and of deep-sea organisms in general.

Lycopodina hypogea is small, only 20 mm high, composed of a fixation basis, a thin peduncle and an ovoid body bearing numerous long filaments. All these structures are maintained by monaxial siliceous spicules. The filaments are covered in small, hook-shaped spicules (anisochelae) disposed at right angles to the surface and giving an adhesiveness ensuring the capture of “hairy” prey. The sponge body comprises diverse cellular types, pinacocytes, archaeocytes, sclerocytes, cells with inclusions and two types of bacteriocytes, well-illustrated in transmission electron microscopy ( Vacelet & Boury-Esnault, 1996).

Previous DNA sequencing using 454 pyrosequencing enabled the discovery of a highly diverse microbial community, comprising at least 22 prokaryotic phyla, dominated by Proteobacteria, Bacteroidetes and Thaumarchaeota. A high abundance of ammonia-oxidising archaea and a dominance of sulphate oxidising/reducing bacteria were observed ( Dupont et al., 2013; Dupont et al., 2014). Few other functional roles of the sponge microbiome were investigated through antioxidant, antimicrobial and chitinase assays ( Dupont et al., 2013). Moreover, methanotrophic symbiotic bacteria may also exist, as they were found in other Cladorhizidae ( Vacelet et al., 1995; Vacelet et al., 1996).

Sexual reproduction could also be investigated under experimental conditions. In the absence of choanocytes, from which male gametes are derived in the vast majority of known sponges, spermatozoa here originate from archaeocyte-like cells. They are concentrated and then transported in special spermatophores, harbouring special spicules (forceps) which, once the spermatophore is released by the filament, ensure its capture on filaments of an individual to be fertilised ( Vacelet et al., 2022). Although L. hypogea regularly reproduces in aquaria, its oogenesis and larval development remain enigmatic.

After the capture of a prey item, generally small crustaceans, polychaetes or other invertebrates with appendages that can become entangled in the filaments of the sponge, the digestive process starts by migration of sponge cells towards the prey. These cells, including cells of the body, but also of filaments not involved in the capture or even cells of the fixation base, cover the prey and incorporate fragments of its cells by phagocytosis. Such a mechanism of digestion is highly unusual in animals, and the movement and high rate of cell renewal make this sponge a model of choice in biology, particularly for studying processes such as apoptosis ( Baghdiguian et al., 2023; Godefroy et al., 2019; Le Goff et al., 2022).

So far, no toxic or paralysing secretion, and no digestive cavity or digestive enzyme, has been identified. However, chlorinated compounds have been detected in the water around L. hypogea kept in aquaria, and a hypothesis currently being studied would give this carnivorous sponge the ability to produce specialised metabolites that could both attract prey and contribute to their immobilisation (Pérez and coll., in progress).

These examples of fundamental knowledge acquired thanks to the accessibility and ease of cultivation of these sponges illustrate how access to the entire hologenome of this model sponge will make it possible to address many specific scientific questions on the biology and ecology of carnivorous sponges and other deep-sea organisms, as well as more general questions on the biology of metazoans.

We present a chromosome-level genome sequence for Lycopodina hypogea. The assembly was generated as part of the Aquatic Symbiosis Genomics project, using the Tree of Life pipeline from a specimen collected from La Ciotat, France ( Figure 1).

Image of the Lycopodina hypogea (odLycHypo2) specimen used for genome sequencing (photograph by Thierry Pérez).

Methods

Sample acquisition

The specimen used for genome sequencing was an adult Lycopodina hypogea (specimen ID GHC0000171, ToLID odLycHypo2; Figure 1). A second specimen was used for Hi-C sequencing (specimen ID GHC0000173, ToLID odLycHypo4) and a third for RNA sequencing (specimen ID GHC0000177, ToLID odLycHypo8). The samples were collected from La Ciotat, France (latitude 43.1633, longitude 5.5999) on 2021-03-23. Thierry Pérez collected and formally identified the species.

Nucleic acid extraction

Protocols for high molecular weight (HMW) DNA extraction developed at the Wellcome Sanger Institute (WSI) Tree of Life Core Laboratory are available on protocols.io ( Howard et al., 2025). The odLycHypo2 sample was weighed and triaged to determine the appropriate extraction protocol. Tissue was homogenised using the sponge squeezing protocol. HMW DNA was extracted using the Manual MagAttract v3 protocol. We used centrifuge-mediated fragmentation to produce DNA fragments in the 8–10 kb range, following the Covaris g-TUBE protocol for ultra-low input (ULI). Sheared DNA was purified by automated SPRI (solid-phase reversible immobilisation). The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit dsDNA High Sensitivity Assay kit. Fragment size distribution was evaluated by running the sample on the FemtoPulse system. For this sample, the final post-shearing DNA had a Qubit concentration of 1.9 ng/μL and a yield of 741.00 ng.

RNA was extracted from whole organism tissue of odLycHypo8 in the Tree of Life Laboratory at the WSI using the RNA Extraction: Automated MagMax™ mirVana protocol. The RNA concentration was assessed using a Nanodrop spectrophotometer and a Qubit Fluorometer using the Qubit RNA Broad-Range Assay kit. Analysis of the integrity of the RNA was done using the Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

PacBio HiFi library preparation and sequencing

Library preparation and sequencing were performed at the WSI Scientific Operations core. Prior to library preparation, the DNA was fragmented to ~10 kb. Ultra-low-input (ULI) libraries were prepared using the PacBio SMRTbell® Express Template Prep Kit 2.0 and gDNA Sample Amplification Kit. Samples were normalised to 20 ng DNA. Single-strand overhang removal, DNA damage repair, and end-repair/A-tailing were performed according to the manufacturer’s instructions, followed by adapter ligation. A 0.85× pre-PCR clean-up was carried out with Promega ProNex beads.

The DNA was evenly divided into two aliquots for dual PCR (reactions A and B), both following the manufacturer’s protocol. A 0.85× post-PCR clean-up was performed with ProNex beads. DNA concentration was measured using a Qubit Fluorometer v4.0 (Thermo Fisher Scientific) with the Qubit HS Assay Kit, and fragment size was assessed on an Agilent Femto Pulse Automated Pulsed Field CE Instrument (Agilent Technologies) using the gDNA 55 kb BAC analysis kit. PCR reactions A and B were then pooled, ensuring a total mass of ≥500 ng in 47.4 μl.

The pooled sample underwent another round of DNA damage repair, end-repair/A-tailing, and hairpin adapter ligation. A 1× clean-up was performed with ProNex beads, followed by DNA quantification using the Qubit and fragment size analysis using the Agilent Femto Pulse. Size selection was performed on the Sage Sciences PippinHT system, with target fragment size determined by Femto Pulse analysis (typically 4–9 kb). Size-selected libraries were cleaned with 1.0× ProNex beads and normalised to 2 nM before sequencing.

The sample was sequenced on a Revio instrument (Pacific Biosciences). The prepared library was normalised to 2 nM, and 15 μL was used for making complexes. Primers were annealed and polymerases bound to generate circularised complexes, following the manufacturer’s instructions. Complexes were purified using 1.2X SMRTbell beads, then diluted to the Revio loading concentration (200–300 pM) and spiked with a Revio sequencing internal control. The sample was sequenced on a Revio 25M SMRT cell. The SMRT Link software (Pacific Biosciences), a web-based workflow manager, was used to configure and monitor the run and to carry out primary and secondary data analysis.

Hi-C

** Sample preparation and crosslinking **

The Hi-C sample was prepared from 20–50 mg of frozen tissue from the odLycHypo4 sample using the Arima-HiC v2 kit (Arima Genomics). Following the manufacturer’s instructions, tissue was fixed and DNA crosslinked using TC buffer to a final formaldehyde concentration of 2%. The tissue was homogenised using the Diagnocine Power Masher-II. Crosslinked DNA was digested with a restriction enzyme master mix, biotinylated, and ligated. Clean-up was performed with SPRISelect beads before library preparation. DNA concentration was measured with the Qubit Fluorometer (Thermo Fisher Scientific) and Qubit HS Assay Kit. The biotinylation percentage was estimated using the Arima-HiC v2 QC beads.

** Hi-C library preparation and sequencing **

Biotinylated DNA constructs were fragmented using a Covaris E220 sonicator and size selected to 400–600 bp using SPRISelect beads. DNA was enriched with Arima-HiC v2 kit Enrichment beads. End repair, A-tailing, and adapter ligation were carried out with the NEBNext Ultra II DNA Library Prep Kit (New England Biolabs), following a modified protocol where library preparation occurs while DNA remains bound to the Enrichment beads. Library amplification was performed using KAPA HiFi HotStart mix and a custom Unique Dual Index (UDI) barcode set (Integrated DNA Technologies). Depending on sample concentration and biotinylation percentage determined at the crosslinking stage, libraries were amplified with 10–16 PCR cycles. Post-PCR clean-up was performed with SPRISelect beads. Libraries were quantified using the AccuClear Ultra High Sensitivity dsDNA Standards Assay Kit (Biotium) and a FLUOstar Omega plate reader (BMG Labtech).

Prior to sequencing, libraries were normalised to 10 ng/μL. Normalised libraries were quantified again to create equimolar and/or weighted 2.8 nM pools. Pool concentrations were checked using the Agilent 4200 TapeStation (Agilent) with High Sensitivity D500 reagents before sequencing. Sequencing was performed using paired-end 150 bp reads on the Illumina NovaSeq 6000.

RNA library preparation and sequencing

Libraries were prepared using the NEBNext ^®^ Ultra™ II Directional RNA Library Prep Kit for Illumina (New England Biolabs), following the manufacturer’s instructions. Poly(A) mRNA in the total RNA solution was isolated using oligo(dT) beads, converted to cDNA, and uniquely indexed; 14 PCR cycles were performed. Libraries were size-selected to produce fragments between 100–300 bp. Libraries were quantified, normalised, pooled to a final concentration of 2.8 nM, and diluted to 150 pM for loading. Sequencing was carried out on the Illumina NovaSeq X, generating paired-end reads.

Genome assembly

Prior to assembly of the PacBio HiFi reads, a database of k-mer counts ( k = 31) was generated from the filtered reads using FastK. GenomeScope2 ( Ranallo-Benavidez et al., 2020) was used to analyse the k-mer frequency distributions, providing estimates of genome size, heterozygosity, and repeat content.

The HiFi reads were assembled using Hifiasm ( Cheng et al., 2021) with the --primary option. Haplotypic duplications were identified and removed using purge_dups ( Guan et al., 2020). The Hi-C reads ( Rao et al., 2014) were mapped to the primary contigs using bwa-mem2 ( Vasimuddin et al., 2019), and the contigs were scaffolded in YaHS ( Zhou et al., 2023) with the --break option for handling potential misassemblies. The scaffolded assemblies were evaluated using Gfastats ( Formenti et al., 2022), BUSCO ( Manni et al., 2021) and MERQURY.FK ( Rhie et al., 2020).

The mitochondrial genome was assembled using MitoHiFi ( Uliano-Silva et al., 2023).

Assembly curation

The assembly was decontaminated using the Assembly Screen for Cobionts and Contaminants ( ASCC) pipeline. TreeVal was used to generate the flat files and maps for use in curation. Manual curation was conducted primarily in PretextView and HiGlass ( Kerpedjiev et al., 2018). Scaffolds were visually inspected and corrected as described by Howe et al. (2021). Manual corrections included 49 breaks and 19 joins. This reduced the scaffold count by 89.0%, increased the scaffold N50 by 15.6%, and reduced the total assembly length by 23.9%. The curation process is described at https://gitlab.com/wtsi-grit/rapid-curation. PretextSnapshot was used to generate a Hi-C contact map of the final assembly.

Assembly quality assessment

The Merqury.FK tool ( Rhie et al., 2020) was run in a Singularity container ( Kurtzer et al., 2017) to evaluate k-mer completeness and assembly quality for the primary and alternate haplotypes using the k-mer databases ( k = 31) computed prior to genome assembly. The analysis outputs included assembly QV scores and completeness statistics.

The genome was analysed using the BlobToolKit pipeline, a Nextflow implementation of the earlier Snakemake version ( Challis et al., 2020). The pipeline aligns PacBio reads using minimap2 ( Li, 2018) and SAMtools ( Danecek et al., 2021) to generate coverage tracks. It runs BUSCO ( Manni et al., 2021) using lineages identified from the NCBI Taxonomy ( Schoch et al., 2020). For the three domain-level lineages, BUSCO genes are aligned to the UniProt Reference Proteomes database ( Bateman et al., 2023) using DIAMOND blastp ( Buchfink et al., 2021). The genome is divided into chunks based on the density of BUSCO genes from the closest taxonomic lineage, and each chunk is aligned to the UniProt Reference Proteomes database with DIAMOND blastx. Sequences without hits are chunked using seqtk and aligned to the NT database with blastn ( Altschul et al., 1990). The BlobToolKit suite consolidates all outputs into a blobdir for visualisation. The BlobToolKit pipeline was developed using nf-core tooling ( Ewels et al., 2020) and MultiQC ( Ewels et al., 2016), with containerisation through Docker ( Merkel, 2014) and Singularity ( Kurtzer et al., 2017).

Metagenome assembly

The metagenome assembly was generated using MetaMDBG ( Benoit et al., 2024) and binned using dastool_raw. PROKKA ( Seemann, 2014) was used to identify tRNAs and rRNAs in each bin, CheckM ( Parks et al., 2015) (checkM_DB release 2015-01-16) was used to assess bin completeness/contamination, and GTDB-Tk ( Chaumeil et al., 2022) (GTDB release 214) was used to taxonomically classify bins. Taxonomic replicate bins were identified using dRep ( Olm et al., 2017) with default settings (95% ANI threshold). All bins were assessed for quality and categorised as metagenome-assembled genomes (MAGs) if they met the following criteria: contamination ≤ 5%, presence of 5S, 16S, and 23S rRNA genes, at least 18 unique tRNAs, and either ≥ 90% completeness or ≥ 50% completeness with fully circularised chromosomes ( Bowers et al., 2017). Bins that did not meet these thresholds, or were identified as taxonomic replicates of MAGs, were retained as ‘binned metagenomes’ provided they had ≥ 50% completeness and ≤ 10% contamination. A taxonomic tree of the bins was constructed from NCBI classifications using ete3 ( Huerta-Cepas et al., 2016) and visualised with matplotlib.

Genome sequence report

Sequence data

PacBio sequencing of the Lycopodina hypogea specimen generated 138.64 Gb (gigabases) from 17.91 million reads, which were used to assemble the genome. GenomeScope2.0 analysis estimated the haploid genome size at 788.73 Mb, with a heterozygosity of 1.55% and repeat content of 81.18% ( Figure 2). These estimates guided expectations for the assembly. Based on the estimated genome size, the sequencing data provided approximately 77× coverage. Hi-C sequencing produced 125.18 Gb from 828.98 million reads, which were used to scaffold the assembly. RNA sequencing data were also generated and are available in public sequence repositories. Table 1 summarises the specimen and sequencing details.

Frequency distribution of k-mers generated using GenomeScope2.The plot shows observed and modelled k-mer spectra, providing estimates of genome size, heterozygosity, and repeat content based on unassembled sequencing reads.

Assembly statistics

The primary haplotype was assembled, and contigs corresponding to an alternate haplotype were also deposited in INSDC databases. The final assembly has a total length of 235.10 Mb in 199 scaffolds, with 1 930 gaps, and a scaffold N50 of 16.02 Mb ( Table 2).

Most of the assembly sequence (98.85%) was assigned to 15 chromosomal-level scaffolds. These chromosome-level scaffolds, confirmed by Hi-C data, are named according to size ( Figure 3; Table 3).

Hi-C contact map of the Lycopodina hypogea genome assembly.Assembled chromosomes are shown in order of size and labelled along the axes, with a megabase scale shown below. The plot was generated using PretextSnapshot.

Table 3.: Chromosomal pseudomolecules in the primary genome assembly of Lycopodina hypogea odLycHypo2.

The mitochondrial genome was also assembled (length 31.1 kb, OZ017794.1). This sequence is included as a contig in the multifasta file of the genome submission and as a standalone record.

Assembly quality metrics

BUSCO v.5.5.0 analysis using the metazoa_odb10 reference set ( n = 954) identified 69.0% of the expected gene set (single = 68.3%, duplicated = 0.6%). The snail plot in Figure 4 summarises the scaffold length distribution and other assembly statistics for the primary assembly. The blob plot in Figure 5 shows the distribution of scaffolds by GC proportion and coverage.

Assembly metrics for odLycHypo2.1.The BlobToolKit snail plot provides an overview of assembly metrics and BUSCO gene completeness. The circumference represents the length of the whole genome sequence, and the main plot is divided into 1 000 bins around the circumference. The outermost blue tracks display the distribution of GC, AT, and N percentages across the bins. Scaffolds are arranged clockwise from longest to shortest and are depicted in dark grey. The longest scaffold is indicated by the red arc, and the deeper orange and pale orange arcs represent the N50 and N90 lengths. A light grey spiral at the centre shows the cumulative scaffold count on a logarithmic scale. A summary of complete, fragmented, duplicated, and missing BUSCO genes in the metazoa_odb10 set is presented at the top right. An interactive version of this figure can be accessed on the BlobToolKit viewer.

BlobToolKit GC-coverage plot for odLycHypo2.1.Blob plot showing sequence coverage (vertical axis) and GC content (horizontal axis). The circles represent scaffolds, with the size proportional to scaffold length and the colour representing phylum membership. The histograms along the axes display the total length of sequences distributed across different levels of coverage and GC content. An interactive version of this figure is available on the BlobToolKit viewer.

Taxonomic verification

The CO1 barcoding sequence (Folmer fragment) of this specimen is 100% identical to L. hypogea sequences HG424317– HG424322. These sequences were published in a population study on L. hypogea ( Chevaldonné et al., 2015).

Genome annotation report

The Lycopodina hypogea genome assembly (GCA_963969325.1) was annotated by Ensembl at the European Bioinformatics Institute (EBI). This annotation includes 16 357 transcribed mRNAs from 16 317 protein-coding genes. The average transcript length is 5 381.60 bp, with an average of 5.36 exons per transcript. For further information about the annotation, please refer to the Ensembl annotation page.

Metagenome report

We recovered 39 bins from the metagenome assembly ( Figure 6), of which 27 met the criteria for MAGs, including four fully circularised genomes. The recovered bins represented 12 bacterial and archaeal phyla, with genome sizes ranging from 0.87 to 7.67 Mbp (mean: 2.51 ± 1.36 Mbp). Mean completeness was 88.2% (± 13.5%) with 1.4% (± 1.7%) contamination. Figure 7 summarises the taxa and quality of the metagenome bins. The full per-bin table of taxa and quality metrics for the metagenome bins is available on Zenodo.

Blob plot of base coverage mapped against GC proportion for sequences in the Lycopodina hypogea metagenome.Binned contigs are coloured by family. Circles are sized in proportion to sequence length on a square-root scale, ranging from 510 to 6 879 066. Histograms show the distribution of sequence length sum along each axis. An interactive version of this figure may be viewed here.

Taxonomic tree based on taxonomic classifications of metagenome bins, constructed using ete3.Colours indicate phylum-level taxonomy. Tracks show genome completeness (blue), sequencing coverage (red, log 10), and genome size (grey bars, Mbp). High-quality MAGs are marked with grey circles; fully circularised MAGs in black.

Wellcome Sanger Institute – Legal and Governance

The materials that have contributed to this genome note have been supplied by a Tree of Life collaborator. The Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use. The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.

The overarching areas of consideration are:

Ethical review of provenance and sourcing of the materialLegality of collection, transfer and use (national and international)

Each transfer of samples is undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Tree of Life collaborator, Genome Research Limited (operating as the Wellcome Sanger Institute) and in some circumstances other Tree of Life collaborators.

Bibliography42

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Altschul SF Gish W Miller W : Basic Local Alignment Search Tool. J Mol Biol. 1990;215(3):403–410. 10.1016/S 0022-2836(05)80360-2 2231712 · doi ↗ · pubmed ↗
2Baghdiguian S Le Goff E Paradis L : Using the carnivorous sponge Lycopodina hypogea as a nonclassical model for understanding apoptosis-mediated shape homeostasis at the organism level. Foundations. 2023;3(2):220–230. 10.3390/foundations 3020018 · doi ↗
3Bateman A Martin MJ Orchard S : Uni Prot: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2023;51(D 1):D 523–D 531. 10.1093/nar/gkac 1052 36408920 PMC 9825514 · doi ↗ · pubmed ↗
4Benoit G Raguideau S James R : High-quality metagenome assembly from long accurate reads with meta MDBG. Nat Biotechnol. 2024:42(9):1378–1383. 10.1038/s 41587-023-01983-6 38168989 PMC 11392814 · doi ↗ · pubmed ↗
5Bowers RM Kyrpides NC Stepanauskas R : Minimum Information about a Single Amplified Genome (MISAG) and a Metagenome-Assembled Genome (MIMAG) of bacteria and archaea. Nat Biotechnol. 2017;35(8):725–731. 10.1038/nbt.3893 28787424 PMC 6436528 · doi ↗ · pubmed ↗
6Buchfink B Reuter K Drost HG : Sensitive protein alignments at Tree-of-Life scale using DIAMOND. Nat Methods. 2021;18(4):366–368. 10.1038/s 41592-021-01101-x 33828273 PMC 8026399 · doi ↗ · pubmed ↗
7Challis R Richards E Rajan J : Blob Tool Kit - interactive quality assessment of genome assemblies. G 3 (Bethesda). 2020;10(4):1361–1374. 10.1534/g 3.119.400908 32071071 PMC 7144090 · doi ↗ · pubmed ↗
8Chaumeil PA Mussig AJ Hugenholtz P : GTDB-Tk v 2: memory friendly classification with the genome taxonomy database. Bioinformatics. 2022;38(23):5315–5316. 10.1093/bioinformatics/btac 672 36218463 PMC 9710552 · doi ↗ · pubmed ↗