The genome of the reef-building coral Porites harrisoni from the southern Persian/Arabian Gulf
Anna Fiesinger, Abdoallah Sharaf, Rachel Alderdice, Gabriela Perna, Hannah Manns, John A. Burt, Christian R. Voolstra, Anna Fiesinger, Anna Fiesinger

TL;DR
This paper presents the genome of a coral species from a very hot region, which helps understand how corals survive extreme heat.
Contribution
The study provides a high-quality genome assembly of Porites harrisoni from the Persian/Arabian Gulf, a region with extreme thermal conditions.
Findings
The genome assembly is 626.7 Mb in size with a BUSCO completeness of 86.3%.
The nuclear genome contains 27,823 protein-coding genes and a high proportion of repeats.
The mitochondrial genome is a single contig of 18,639 bp with typical coral mitogenome features.
Abstract
We present a genome assembly from the coral species Porites harrisoni from the southern Persian/Arabian Gulf, the hottest ocean basin where corals live. The assembly is 626.7 Mb in size, spanning 1,883 contigs with a contig N50 of 807.4 kb, including a single-contig mitochondrial genome. The assembly has a BUSCO completeness of 86.3% (single = 72.5%, duplicated = 13.7%, fragmented = 1.2%, missing = 12.5%). Within the nuclear genome, 59.23% are repeats (15.89% retroelements, 10.00% DNA transposons, and 31.71% unclassified repeats). Gene annotation of the nuclear genome assembly identified 27,823 protein-coding genes. The mitogenome is 18,639 bp long, with 13 protein-coding genes, 2 tRNAs, and 2 rRNAs. The P. harrisoni genome is a valuable resource of a coral from an extreme environment, enhancing understanding of the genomic architecture underlying thermal resilience. Comparative…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5|
| ||||
| Study title |
| |||
| ToLID | jaPorHarr | |||
| BioProject ID |
| |||
| BioProject title |
| |||
| Species |
| |||
| BioSample ID |
| |||
| NCBI taxonomy ID | 627007 | |||
|
| ||||
|
|
|
|
|
|
|
| ||||
| UAE_SA_Phar_03 | ONT library (ligation protocol LSK114) | ONT R10.4.1 |
|
|
|
| ||||
| UAE_SA_Phar_01 | NEBNext Ultra II Directional mRNA | Illumina NovaSeq |
|
|
| UAE_SA_Phar_02 | NEBNext Ultra II Directional mRNA | Illumina NovaSeq |
|
|
| UAE_SA_Phar_03 | NEBNext Ultra II Directional mRNA | Illumina NovaSeq |
|
|
| UAE_SA_Phar_04 | NEBNext Ultra II Directional mRNA | Illumina NovaSeq |
|
|
| UAE_SA_Phar_05 | NEBNext Ultra II Directional mRNA | Illumina NovaSeq |
|
|
| UAE_SA_Phar_06 | NEBNext Ultra II Directional mRNA | Illumina NovaSeq |
|
|
| UAE_SA_Phar_07 | NEBNext Ultra II Directional mRNA | Illumina NovaSeq |
|
|
| UAE_SA_Phar_08 | NEBNext Ultra II Directional mRNA | Illumina NovaSeq |
|
|
| UAE_SA_Phar_09 | NEBNext Ultra II Directional mRNA | Illumina NovaSeq |
|
|
| UAE_SA_Phar_10 | NEBNext Ultra II Directional mRNA | Illumina NovaSeq |
|
|
|
| |
| Estimated genome size (Mb) | 599 |
| Assembled genome size (Mb) | 626.7 |
| Number of contigs | 1,883 |
| Contig N50 (kb) | 807.4 |
| Longest contig (Mb) | 5.5 |
| Shortest contig (bp) | 487 |
| Average contig length (kb) | 332.8 |
| Repeat content (%) | 59.23 |
| GC content (%) | 39.06 |
| Number of gaps in contigs | 0 |
|
| |
| Complete (%) | 86.3 |
| - Single-copy (%) | 72.5 |
| - Duplicated (%) | 13.7 |
| Fragmented (%) | 1.2 |
| Missing (%) | 12.5 |
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
|
|
|
| ||
| SINEs | 20,234 | 4,048,363 | 0.65 | ||
| Penelope | 4,202 | 872,880 | 0.14 | ||
| LINEs | 181,989 | 54,437,356 | 8.69 | ||
| CRE/SLACS | 2,294 | 1,323,659 | 0.21 | ||
| L2/CR1/Rex | 94,299 | 28,218,925 | 4.50 | ||
| R2/R4/NeSL | 8,477 | 2,181,497 | 0.35 | ||
| RTE/Bov-B | 25,731 | 7,080,823 | 1.13 | ||
| L1/CIN4 | 468 | 489,642 | 0.08 | ||
| LTR elements | 115,298 | 40,237,751 | 6.42 | ||
| BEL/Pao | 6,013 | 5,910,495 | 0.94 | ||
| Ty1/Copia | 2,579 | 507,411 | 0.08 | ||
| Gypsy/DIRS1 | 23,041 | 12,224,047 | 1.95 | ||
| Retroviral | 12,561 | 2,721,185 | 0.43 | ||
|
|
|
|
| ||
| hobo-Activator | 34,240 | 8,524,951 | 1.36 | ||
| Tc1-IS630-Pogo | 19,013 | 4,057,297 | 0.65 | ||
| MULE-MuDR | 1,125 | 211,018 | 0.03 | ||
| Tourist/Harbinger | 11,439 | 2,810,493 | 0.45 | ||
|
|
|
|
| ||
|
|
|
|
| ||
|
|
|
|
| ||
|
|
|
|
| ||
|
|
|
|
|
|
| |
| Protein-coding genes | 27,823 |
| Average gene length (bp) | 10,193 |
| Average protein length (aa) | 479 |
| High-confidence tRNAs | 674 |
|
| |
| EggNOG | 21,927 |
| InterProScan | 21,802 |
| COG | 20,717 |
| Pfam | 19,492 |
| GO terms | 16,430 |
| SignalP | 3,959 |
| Phobius | 3,906 |
| BUSCO (metazoa_odb10, | 1,211 |
| MEROPS | 1,049 |
| CAZYme | 505 |
|
| |
| Complete (%) | 83.5 |
| - Single-copy (%) | 61.2 |
| - Duplicated (%) | 22.3 |
| Fragmented (%) | 2.4 |
| Missing (%) | 14.1 |
|
|
|
|
|---|---|---|
| ETE toolkit | 3.1.3 |
|
| MinKNOW | 23.07.12 |
|
| Dorado | 7.1.4 |
|
| porechop | 0.2.4 |
|
| NanoPlot | 1.44.1 |
|
| chopper | 0.9.0 |
|
| NCBI datasets | 16.14.0 |
|
| Meryl | 1.3 |
|
| GenomeScope2.0 | 2.0 |
|
| NECAT | 0.0.1 |
|
| BlobToolKit | 4.3.0 |
|
| Racon | 1.5.0 |
|
| Medaka | 1.12.0 |
|
| funannotate | 1.8.5 |
|
| BUSCO | 6.0.0 |
|
| minimap2 | 2.28 |
|
| Canu | 2.3 |
|
| Circlator | 1.5.5 |
|
| MITOS2 | 2.1.9 |
|
| Geneious Prime | 2025.1.1 |
|
| RepeatModeler | 2.0.4 |
|
| EDTA | 2.2.2 |
|
| RepeatMasker | 4.1.6 |
|
| FastQC | 0.12.1 |
|
| Trimmomatic | 0.39 |
|
| BRAKER3 | 3.0.8 |
|
| STAR | 2.7.11b |
|
| SAMtools | 1.16.1 |
|
| tRNAscan-SE | 2.0.12 |
|
| AGAT | 1.4.1 |
|
| GenomeTools | 1.6.5 |
|
| GffRead | 0.12.7 |
|
| InterProScan | 5.59–91.0 |
|
| EggNOG-mapper | 2.1.12 |
|
| Phobius | 1.01 |
|
| GeneForge | 1.0 |
|
- —Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)
- —Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)
- —Sequencing Analysis (SequAna) Core Facility
- —Paul G. Allen Family Foundation (PGAFF) project
- —Paul G. Allen Family Foundation (PGAFF) project
- —NYUAD Research Institute
- —NYUAD Research Institute
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCoral and Marine Ecosystems Studies · Marine Sponges and Natural Products · Genomics and Phylogenetic Studies
Introduction
Corals surviving in extreme environments (i.e., experiencing variable and extreme water temperatures as well as high salinities) can serve as model organisms to study the adaptive capacity of corals in more temperate regions. Such corals offer valuable insights into the genomic architecture and genes underlying thermal resilience [1–3]. The hottest oceanic basin where corals occur is the shallow and semi‐enclosed Persian/Arabian Gulf (PAG), where temperature extremes and ranges exceed those of coral reefs elsewhere (<12 °C to >36 °C annually) [4, 5]. Due to the relatively recent geological formation of the PAG, estimated at about 6–12 ka, and its shift to a hot climate, corals in these waters have had little time to adapt to these extreme conditions [6]. As a result, they inform our understanding of how coral reefs at large may respond to rapid warming.
One prominent coral in the PAG, and among the species most resilient to recurrent bleaching events [7], is the endemic species Porites harrisoni Veron, 2000 (Figure 1) from the complex coral clade. The complex clade comprises many ecologically dominant reef builders [8, 9], with the genus Porites hosting several thermally resilient species [10–14]. Colonies of *P. harrisoni *abundantly inhabit shallow fringing reefs in the PAG [15], and *P. harrisoni *exhibits diverse growth forms – submassive, nodular, columnar, or branching – on a broad encrusting base, with coloration varying from light to dark brown [15]. The thermal resilience of P. harrisoni is in part attributed to its association with the thermotolerant symbiotic alga Cladocopium thermophilum, especially in the southern PAG [6, 16, 17]. This alga has undergone rapid adaptive radiation upon colonizing the PAG [18]. In line with this, several studies suggest that PAG corals carry unique genomic signatures underpinning their exceptional heat tolerance [19, 20], conferring heritable thermal tolerance traits [20, 21]. Like many other corals in the PAG, P. harrisoni has one of the highest ecological bleaching thresholds in the world [22]. Originally described as a different species, P. harrisoni was first identified for its exceptional tolerance to extreme temperatures in a study highlighting the thermal tolerance of PAG corals [23] and later recognized for its persistence in hypersaline, thermally extreme lagoons where most corals are unable to survive [24]. However, this species also lives exceptionally close to its upper thermal tolerance threshold [25]. This species may provide critical insight into the genomic basis of coral thermal resilience, as it has persisted under harsh conditions that have eliminated many other corals, particularly in the southern PAG, due to recurrent bleaching events [12]. This understanding is crucial given the increasing rate of ocean warming.
Here, we report on the sequencing, assembly, and annotation of the genome of P. harrisoni from the southern PAG. Species taxonomy for this species is Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Cnidaria; Anthozoa; Hexacorallia; Scleractinia; Fungiina; Poritidae; Porites; Porites harrisoni Veron, 2000 (NCBI:txid627007). The genome assembly is based on long-read Oxford Nanopore Technologies (ONT) sequencing. The generated genome is 626.7 Mb, comprising 1,883 contigs with a contig N50 of 807.4 kb. The nuclear genome annotation yielded 27,823 protein-coding genes, validated through RNA-Seq data and available curated Metazoa proteomes. We also assembled a complete single-contig mitochondrial genome (mitogenome), which should inform biodiversity assessment and further contribute to resilience understanding [26, 27]. The genome of P. harrisoni will facilitate comparisons among coral species regarding the genomic basis of thermal resilience.
The species Porites harrisoni, Veron 2000. (A) Cladogram illustrating evolutionary relationships of Porites harrisoni (bold) and other Hexacorallia with sequenced genomes [28–35]. The data is adapted from a simple species tree generated with the NCBI ETE v3.1.3 toolkit [36]. Robust and complex coral clades are indicated [8, 9, 37]. A full-scale phylogenetic tree illustrating evolutionary relationships of the genus Porites within the Scleractinia is available from a recent study [38]. (B) In situ colony photo of P. harrisoni from Saadiyat Reef in the PAG. Picture credit: Christian R. Voolstra.
Material and Methods
Sample collection, DNA isolation, and sequencing
One colony of Porites harrisoni was tagged and sampled for genomic DNA (gDNA) extraction for the reference genome (BioSample ID SAMN41390914; Table 1) at Al Saada Reef, Abu Dhabi, United Arab Emirates (24.085250, 52.243333) in the thermally extreme southern PAG on 30 May 2022 using a Nemo underwater drill (Nemo Power Tools LLC, Nevada, USA) fitted with a diamond tip corer. The coral plug was obtained from a Coral Bleaching Automated Stress System (CBASS) [39] experiment described elsewhere [25] at the baseline temperature (34.1 °C) treatment, at time point 2 (i.e., after overnight recovery). The coral fragment was submerged in 3 ml of DNA/RNA shield (Zymo Research Europe GmbH, Freiburg, Germany) in a WhirlPak bag for at least 3 hours, after which the tissue was airpicked off the skeleton using an air pump with sterile filter tips. Tissue slurry was transferred to 2 × 2 ml cryotubes and stored at 4 °C until shipment to the lab facilities at the University of Konstanz, Germany. The sample was stored at –20 °C until extraction.
DNA was extracted using the DNeasy Blood & Tissue kit (Qiagen, Hilden, Germany) following the manufacturer’s protocol with a minor modification. Briefly, the tissue slurry was homogenized using a Polytron PT 1200 E (Kinematica, Switzerland); then, the sample was centrifuged at 5,000 × g for 3 mins, and 90 µl of the supernatant were transferred from the cryotube to a 1.5 ml tube. Next, 90 µl of ATL buffer and 20 µl of Proteinase K (Qiagen, Hilden, Germany) were added, and the sample was lysed for 1 hr at 56 °C with shaking at 300 rpm. After extraction, two elution steps were performed [40]: 50 µl of AE elution buffer (Qiagen, Hilden, Germany) were added to the membrane of the spin column, incubated for 1 min, and centrifuged at 7,000 × *g *for 1 min. A subsequent 50 µl AE buffer were added to the membrane of the spin column in the same elution tube, incubated for 5 mins, and centrifuged as previously. The quantity and quality of the DNA were assessed using a NanoDrop2000 spectrophotometer (Thermo Fisher Scientific, Waltham, Massachusetts, USA).
A total of 6.54 µg DNA were submitted to the NGS Competence Centre Tübingen (NCCT) for the construction of an ONT sequencing library using the ligation protocol LSK114 (Oxford Nanopore Technologies, Oxford, UK). The library was sequenced with an R10.4.1 flowcell on a PromethION 24 (Oxford Nanopore Technologies, Oxford, UK). Base calling was done using MinKNOW v23.07.12 (Oxford Nanopore Technologies, Oxford, UK) and Dorado v0.7.4. (RRID:SCR_025883) [41]. Base-called reads were trimmed using porechop v0.2.4 (RRID:SCR_016967) [42], and the quality was assessed using NanoPlot v1.44.1 (RRID:SCR_024128) [43]. For the genome assembly, the ONT reads were split into longer reads for assembly (minimum average quality 3, minimum length 1,000 bp) and shorter reads for polishing (minimum average quality 5, minimum length 500 bp), while removing reads mapping to algal symbionts (i.e., all publicly available reads from genomes of the dinoflagellate family Symbiodiniaceae) using chopper v0.9.0 (RRID:SCR_026486) [43]. The Symbiodiniaceae reference reads (NCBI:txid252141) were downloaded from NCBI using the NCBI Datasets command-line interface (CLI) v16.14.0 (parameters: genome taxon 252141) [44].
Sample collection, RNA isolation, and sequencing
A total of 20 samples (∼2 cm each) were drilled from 10 colonies of Porites harrisoni at Al Saada Reef (24.085250, 52.243333) in the southern PAG on 30 May 2022 for RNA isolation (Table 1) using a Nemo underwater drill (Nemo Power Tools LLC, Nevada, USA) fitted with a diamond tip corer. The coral plugs were sampled from a CBASS experiment [39] described elsewhere [25], at the baseline temperature (34.1 °C) at time points 1 and 2, i.e., after the heat-hold and after the overnight recovery. As described above, the coral fragments were submerged in 3 ml of DNA/RNA shield (Zymo Research Europe GmbH, Freiburg, Germany) in a WhirlPak bag for at least 3 hours, after which the tissue was airpicked off the skeleton using an air pump with sterile filter tips. Tissue slurry was transferred to 2 ml cryotubes and stored at 4 °C until shipment to the lab facilities at the University of Konstanz, Germany. The samples were stored at –20 °C until extraction.
RNA was extracted following a previously described protocol [45] using the RNeasy Mini Kit (Qiagen, Hilden, Germany) adapted for use with a QIAcube (Qiagen, Hilden, Germany) for standardized extraction. The quantity and quality of the RNA were assessed using a NanoDrop2000 spectrophotometer (Thermo Fisher Scientific, Waltham, Massachusetts, USA) as well as gel electrophoresis to check for the presence and integrity of rRNA bands. The extracted RNA was sent to the NCCT for library preparation using the NEBNext Ultra II Directional mRNA kit and subsequent paired-end sequencing on an Illumina NovaSeq (2 × 150 bp).
Estimation of the genome size
The genome size was estimated using a k-mer histogram generated with Meryl v1.3 (RRID:SCR_026366) [46] based on the ONT base-called and trimmed reads, and visualized in GenomeScope 2.0 (RRID:SCR_017014) [47].
Genome assembly, curation, and evaluation
The initial draft genome was assembled using a subset of size-selected ONT reads (minimum average quality 3, minimum length 1,000 bp; total number of reads: 20,260,201) with the de novo Nanopore assembler NECAT v0.0.1 (RRID:SCR_025350) [48]. The quality of the assembled draft genome was assessed using BlobToolKit v4.3.0 (RRID:SCR_025882) [49], which prompted the removal of 442 contigs due to their similarity to non-related taxa. The draft assembly was then polished using a second subset of size-selected ONT reads (minimum average quality 5, minimum length 500 bp; total number of reads: 21,513,716) with one round of Racon v1.5.0 (RRID:SCR_017642) [50] and one round using Medaka v1.12.0 (parameters: --model r1041_e82_400bps_hac_v4.3.0) [51]. After polishing, funannotate v1.8.5 (RRID:SCR_023039) [52] was used to remove duplicated contigs and those shorter than 200 bp (parameters: funannotate clean, -m 200), prompting the removal of a further five contigs. The quality of the assembly was assessed using BUSCO v6.0.0 (RRID:SCR_015008) [53] and BlobToolKit v4.3.0.
Mitochondrial genome assembly and annotation
The mitogenome was assembled using the same set of size-selected reads as above (minimum average quality 3, minimum length 1,000 bp), with a subsequent filtering step: reads were mapped to the available Porites harrisoni mitogenome (GenBank accession: MG754070, RefSeq accession: NC_037435) [54] using minimap2 v2.28 (RRID:SCR_018550) [55]. The mapped reads were then extracted using SAMtools v1.21 (RRID:SCR_002105) [56] and subsequently assembled using Canu v2.3 (RRID:SCR_015880) [57] (parameters: genomeSize=18k, -nanopore -trimmed). The resulting mitogenome was circularized using Circlator v1.5.5 (RRID:SCR_016058) (parameters: --assembler canu, --merge_min_id 85, --merge_breaklen 1000) [58]. The circularized mitogenome was then polished using Racon v1.5.0 with the shorter reads as above (minimum average quality 5, minimum length 500 bp). The mitogenome was annotated with MITOS2 [59] and manually curated in Geneious Prime v2025.1.1 (RRID:SCR_010519) (GraphPad Software, LLC, Boston, USA) against a published P. harrisoni mitogenome [54] and a mitogenome of the closely related species *Porites lobata *[60].
Repeat annotation
DNA repeats in the genome were identified using a combination of RepeatModeler v2.0.4 (RRID:SCR_015027) [61] and EDTA v2.2.2 (RRID:SCR_022063) [62] and masked using RepeatMasker v4.1.6 (RRID:SCR_012954) [63].
Gene prediction and annotation
Gene prediction was performed using BRAKER3 v3.0.8 (RRID:SCR_018964) [64]. First, the transcript evidence was prepared as follows. After quality checking the RNA-Seq reads using FastQC v0.12.1 (RRID:SCR_014583) [65], the reads were trimmed using Trimmomatic v0.39 (RRID:SCR_011848) [66] and subsequently mapped to the *Porites harrisoni *assembly using STAR v2.7.11b (RRID:SCR_004463) [67]. The resulting mapping files were merged, and the reverse strand-specific RNA reads were extracted using SAMtools v1.21. Second, curated Metazoa proteomes were retrieved from OrthoDB 11 [68] and used as protein evidence as recommended by the BRAKER3 pipeline [64, 69, 70]. Lastly, BRAKER3 was run using the reverse strand-specific RNA mapping files and protein evidence as input with the masked genome assembly (parameters: --busco_lineage=metazoa_odb10, --species=Porites_harrisoni) [53, 71–80]. Using the script stringtie2utr.py from the BRAKER3 suite, untranslated regions (UTRs) were added to the annotation file output by BRAKER3. In addition, 674 eukaryotic high-confidence transfer RNAs were predicted using tRNAscan-SE v2.0.12 (RRID:SCR_008637) [81] based on filtering of the initial set of 5,938 putative tRNAs using EukHighConfidenceFilter [81]. The high-confidence set of tRNAs was merged with the BRAKER3 structural predictions. Then, the annotation file was checked for overlapping genes using AGAT (RRID:SCR_027223) [82] and validated using GenomeTools v1.6.5 (RRID:SCR_016120) [83]. Finally, GffRead (RRID:SCR_018965) [77] was used to extract the predicted protein sequences from the merged file for functional annotation. The predicted genes were annotated using InterProScan v5.59-91.0 (RRID:SCR_005829) [84], EggNOG-mapper v2.1.12 (RRID:SCR_021165) [85, 86], and Phobius v1.01 (RRID:SCR_015643) [87]. The respective annotation files were then fed into funannotate, with the predicted genes in the GFF3 file format, to synthesize all annotations (parameters: --species "Porites harrisoni" --busco_db metazoa, --eggnog --iprscan --phobius). The final annotation was assessed using BUSCO v6.0.0. The gene prediction and functional annotation outlined in this study were used to develop the fully automated annotation pipeline GeneForge v1.0 [88].
Results and discussion
Genome assembly
The genome was sequenced from a single Porites harrisoni colony, collected from Al Saada Reef in the southern PAG on 30 May 2022 (Table 1). A total of 22,825,985 ONT long-reads were obtained from an R10.4.1 flow cell (Oxford Nanopore Technologies, Oxford, UK) following adapter removal and trimming, with a mean read length of 5,071 bp (median 3,407 bp) and a read length N50 of 7,424 bp. The reads showed a mean base call quality score of 14.9, and the longest read sequenced was 991,055 bp with a quality score of 10.
The estimated genome size using a k-mer approach (k = 21) was 599 Mb (Figure 2). The analysis revealed a diploid genome (p = 2) that is predominantly homozygous (98% AA) with very little heterozygosity (1.96% AB). While GenomeScope profiles are not optimized for standard ONT reads (given the lower accuracy in comparison to short reads), the use of ONT-derived k-mer counts still provides a valuable and informative approximation of genome size, particularly in the absence of complementary short-read sequencing. The genome size estimated here is consistent with that of other assembled Porites genomes [35].
Genome-size estimation-plot of the Porites harrisoni genome. The genome-size estimation-plot was generated from a k-mer distribution (k = 21) using Meryl v1.3 and GenomeScope 2.0, based on ONT reads with a low error rate (1.02%). The analysis revealed a diploid genome (p = 2) with a size of 599 Mb, largely unique (50.3%) and predominantly homozygous (98% AA), with very little heterozygosity (1.96% AB).
The initial draft assembly with very high completeness (see ‘Data validation’) was decontaminated (i.e., contigs mapping to non-cnidarian species were removed; Figure 3) and polished, and contigs shorter than 200 bp were removed to improve accuracy and eliminate spurious fragments. Additionally, the mitogenome of P. harrisoni was assembled as a single contig using Canu [57] and circularized using Circlator v1.5.5 [58], resulting in a total length of 18,639 bp closely matching a published P. harrisoni mitogenome from the southern Red Sea [54].
Quality assessment of the Porites harrisoni genome assembly using BlobToolKit v4.3.0. (A) GC-coverage plot showing sequence coverage (vertical axis) and GC content (horizontal axis) following decontamination (i.e., removal of contigs mapping to non-cnidarian species). Contigs are colored by phylum (light blue = Cnidaria, dark blue = no hit); circles are sized in proportion to contig length. Histograms show the distribution of contig length sums along each axis. (B) Cumulative sequence length of the final assembly after removal of contigs from non-target species. Curves show cumulative length for all contigs with colors denoting phylum (light blue = Cnidaria, dark blue = no hit).
The final genome assembly (nuclear genome and mitogenome) covered 1,883 contigs with a total length of 626.7 Mb and a contig N50 of 807.4 kb (Figure 4; Table 2). Repetitive sequences identified through a combination of RepeatModeler and EDTA [61, 62] comprised 371,184,681 bp (59.23% of the assembled genome), including retroelements (15.89%), DNA transposons (10.00%), and unclassified repeats (31.71%; Table 3). This number of repetitive sequences is higher than for other Porites genomes, which have between 40 to 50% repeat content [35], and is likely due to the combined use of multiple repeat-identification methods rather than a single approach.
Assembly of the Porites harrisoni genome. The snail plot generated by BlobToolKit v4.3.0 is divided into 1,000 size-ordered bins around the circumference, with each bin representing 0.1% of the 626,672,702 bp assembly. The distribution of sequence lengths is shown in dark grey with the plot radius scaled to the longest sequence present in the assembly (5,540,956 bp, shown in red). Orange and pale-orange arcs show the N50 (807,480 bp) and N90 (172,108 bp) sequence lengths, respectively. The pale grey spiral shows the cumulative sequence count on a log scale, with white scale lines showing successive orders of magnitude. The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT, and N percentages in the same bins as the inner plot. A summary of complete, duplicated, fragmented, and missing BUSCO genes in the eukaryota_odb10 set is shown at the top right (Table 2).
Genome annotation
The Porites harrisoni genome assembly was annotated using the BRAKER3 pipeline [64, 89], and functional annotation was done using funannotate [52]. The resulting annotation included 27,823 protein-coding genes and 674 high-confidence eukaryotic tRNAs. The average gene length was 10,193 bp, and the average protein length was 479 amino acids (aa; Table 4). The number of protein-coding genes is lower than in other Porites genomes (between 30,000 and 40,000 genes; [13, 35]), but is comparable to other coral genomes from the complex and robust clade [32, 90–92]. BUSCO completeness comparisons indicate a higher proportion of missing conserved genes in the *P. harrisoni *assembly (12.5% versus ∼1–3% in other published Porites genomes), suggesting that some genes may be absent due to assembly fragmentation, limited sequencing depth, or sequencing technology differences (e.g., short-read, long-read, hybrid assembly).
Additionally, the mitogenome was annotated using MITOS2 [59] and manually curated in Geneious Prime (GraphPad Software, LLC, Boston, USA). The mitogenome contains 13 protein-coding genes, 2 tRNAs (trnM, trnW), and 2 rRNAs (rnL, rnS; Figure 5). Two large Group I introns interrupting the nad5 and cox1 genes, with 11,131 bp and 972 bp in length, respectively, are present. By comparison, a published P. harrisoni mitogenome from the southern Red Sea [54] has the same 13 protein-coding genes, the same 2 tRNA and 2 rRNA genes, and 2 similar Group I introns.
Mitochondrial genome (mitogenome) of Porites harrisoni. The direction of the arrows indicates the read orientation of the respective genes. The annotated features are demarcated by color: teal indicates the annotated genes (n = 13), light green denotes their corresponding coding sequences (CDS); tRNAs (n = 2) are in dark pink, rRNAs (n = 2) in red, and their respective exons are in grey (n = 4); introns interrupting the nad5 and cox1 genes (n = 2) are in white.
Data validation
The initial draft assembly of the Porites harrisoni genome was 792 Mb in size and exhibited a high level of completeness, with a BUSCO score of 98.0% (single = 81.1%, duplicated = 16.9%, fragmented = 1.2%, missing = 0.8%) based on the eukaryota_odb10 reference set (n = 255), indicating that the majority of expected single-copy orthologs was successfully recovered. Following the removal of contigs not aligning to cnidarian sequences to ensure taxonomic accuracy and removal of non-coral sequences, the completeness was reassessed using BUSCO. This reassessment resulted in a completeness score of 86.3% (single = 72.5%, duplicated = 13.7%, fragmented = 1.2%, missing = 12.5%) utilizing the eukaryota_odb10 reference set (n = 255; Table 2). Further refinements of the *P. harrisoni *assembly could benefit from incorporating additional long-read (e.g., PacBio HiFi) and/or short-read resequencing data to improve contiguity and resolution, especially for unequivocally determining the definitive number of protein-coding genes; the integration of Hi-C data could aid in generating a chromosome-level assembly (e.g., [93]), although misjoints may be introduced if the Hi-C data is of insufficient quality. Subsequent evaluation of completeness based on the set of annotated protein-coding genes yielded a BUSCO completeness of 83.5% (single = 61.2%, duplicated = 22.3%, fragmented = 2.4%, missing = 14.1%) using the eukaryota_odb10 reference set (n = 255; Table 4). Although other Porites genomes show higher completeness scores [13, 35], the results in this study confirm that the final Porites harrisoni assembly and annotation represent a high-quality and taxonomically validated coral genome, with the reduction in BUSCO completeness attributable to the deliberate exclusion of non-cnidarian sequences, rather than a reduction in genomic accuracy.
Conclusion
The genome presented in this study represents the first assembled genome of *Porites harrisoni *from the thermally extreme environment of the southern PAG. As one of the world’s most thermally resilient coral species, the genome of this coral provides a critical reference for investigating the genomic basis of heat tolerance, rapid adaptation, and host-algal symbiosis under severe thermal stress. The availability of a high-quality and complete mitochondrial genome from the PAG enables comparative genomic analyses across coral species and populations, facilitating the identication of adaptive loci. Placed alongside the increasing suite of high-quality reference genomes for corals, e.g., [32, 35, 90–92, 94] and other non-model taxa [95–97], this genome strengthens the expanding genomic foundation that is driving advances in evolutionary biology and conservation science. Moreover, this genome serves as a foundation for future studies on coral evolutionary genomics and population structure of *P. harrisoni *and will support studies aimed at predicting coral responses to ongoing ocean warming.
Availability of source code
All bioinformatic tools used in this study are listed in Table 5; all curated pipelines are available on GitHub at https://github.com/SequAna-Ukon/Porites_harrisoni_genome (MIT license). The pipelines are implemented in Bash and run on Unix-like operating systems.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Riegl BM, Purkis SJ. Coral reefs of the gulf: adaptation to climatic extremes in the world’s hottest sea. Coral Reefs of the World. Dordrecht: Springer Netherlands, 2012; pp. 1–4. doi:10.1007/978-94-007-3008-3_1. · doi ↗
- 2Burt JA, Camp EF, Enochs IC Insights from extreme coral reefs in a changing world. Coral Reefs, 2020; 39(3): 495–507. doi:10.1007/s 00338-020-01966-y. · doi ↗
- 3Camp EF, Suggett DJ, Pogoreutz C Corals exhibit distinct patterns of microbial reorganisation to thrive in an extreme inshore environment. Coral Reefs, 2020; 39(3): 701–716. doi:10.1007/s 00338-019-01889-3. · doi ↗
- 4Coles SL, Riegl BM. Thermal tolerances of reef corals in the Gulf: a review of the potential for increasing coral survival and adaptation to climate change through assisted translocation. Mar. Poll. Bull., 2013; 72(2): 323–332. doi:10.1016/j.marpolbul.2012.09.006. 23058810 · doi ↗ · pubmed ↗
- 5Vaughan GO, Al-Mansoori N, Burt JA. Chapter 1 - The Arabian Gulf. Amsterdam: Elsevier Ltd, 2018; pp. 1–23. doi:10.1016/B 978-0-08-100853-9.00001-4. · doi ↗
- 6Hume BCC, D’Angelo C, Smith EG Symbiodinium thermophilum sp. nov., a thermotolerant symbiotic alga prevalent in corals of the world’s hottest sea, the Persian/Arabian Gulf. Sci. Rep., 2015; 5(1): 8562. doi:10.1038/srep 08562. 25720577 PMC 4342558 · doi ↗ · pubmed ↗
- 7Burt JA. Coral reefs of the Emirates. A Natural History of the Emirates. Cham: Springer Nature Switzerland, 2024; pp. 325–351. doi:10.1007/978-3-031-37397-8_11. · doi ↗
- 8Bhattacharya D, Agrawal S, Aranda M Comparative genomics explains the evolutionary success of reef-forming corals. e Life, 2016; 5: 13288. doi:10.7554/e Life.13288. PMC 487887527218454 · doi ↗ · pubmed ↗
