High-quality genome assembly and annotation of five bacteria isolated from the Abu Dhabi sabkha-shore region
Beenish Sarfraz, Jean Tuyisabe, Louis De Montfort, Abdulrahman Ibrahim, Shamma Z. Abdulkreem Almansoori, Haya Alajami, Asma Almeqbaali, Biduth Kundu, Vishnu Sukumari Nath, Esam Eldin Saeed, Ajay Kumar Mishra, Khaled Michel Hazzouri, Raja Almaskari, Abhishek Kumar Sharma

TL;DR
This study provides complete genome sequences of five bacteria from the extreme Abu Dhabi sabkha environment, which could help in developing microbes for industrial and agricultural uses.
Contribution
The novel contribution is the high-quality genome assembly and annotation of five bacterial isolates from a polyextreme environment.
Findings
Five bacterial isolates were identified and assigned to species including Staphylococcus capitis and Bacillus spizizenii.
Hybrid sequencing techniques produced complete, gap-free genome sequences ranging from 2.4 Mb to 5.05 Mb.
The genomes may aid in engineering microbes for bioremediation and plant growth under salinity stress.
Abstract
Sabkhas represent polyextreme environments characterized by elevated salinity levels, intense ultraviolet (UV) radiation exposure, and extreme temperature fluctuations. In this study, we present the complete genomes of five bacterial isolates isolated from the sabkha-shore region and investigate their genomic organization and gene annotations. A better understanding of the bacterial genomic organization and genetic adaptations of these bacteria holds promise for engineering microbes with tailored functionalities for diverse industrial and agricultural applications, including bioremediation and promotion of plant growth under salinity stress conditions. We present a comprehensive genome sequencing and annotation of five bacteria (kcgeb_sa, kcgeb_sc, kcgeb_sd, kcgeb_S4, and kcgeb_S11) obtained from the shores of the Abu Dhabi Sabkha region. Initial bacterial identification was conducted…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Microbial Community Ecology and Physiology · Enzyme Production and Characterization
Objective
Sabkhas, also known as salt flats, represent polyextreme environments with high temperatures, salinities, and light intensities and are distributed globally in arid regions of the Middle East, North Africa, the USA, and Australia. Sabkhas pose a challenging environment for the survival of plants, animals, and various organisms due to their extreme conditions [1, 2]. Despite the harsh environmental conditions, these salt flats host remarkably robust and diverse microbial communities that are highly adaptable and metabolically diverse and have excellent abiotic stress resilience [3–5].
Previously, our unprecedented research effort cataloged the rich microbial diversity and distribution dynamics of the Abu Dhabi sabkha region using a combination of 16 S rDNA profiling and whole genome metagenomic approaches [6]. However, there is a paucity of high-quality complete genome sequences of bacteria isolated from the Abu Dhabi sabkha region. Consequently, in this study, we present complete genome sequences and gene annotations for five bacterial isolates isolated from the Abu Dhabi sabkha-shore region that exhibit higher salt tolerance. The genomic resources and datasets generated in this study will serve as a valuable repository for exploring genes and pathways associated with abiotic stress tolerance as well as understanding the mechanisms that bacteria use to survive in extreme environments. Nevertheless, the information gleaned from these bacterial species could be exploited for comparative genomics research programs and pave the way for engineering microbes endowed with high plant growth promotion activity for enhanced performance under high salt-stress conditions, opening up new avenues for sustainable agriculture for feeding burgeoning population.
Data description
Methodology
The five bacterial isolates used for whole-genome sequencing (WGS) were isolated from soil samples collected from the Abu Dhabi sabkha-shore region. Details on the systematic sample collection, bacterial culture strategy, and storage procedure are described in our previously published report [6]. A snapshot of our data analysis workflow is presented in Table 1 (Data file 1).
High-quality DNA isolation, quantitation, quality checks and 16S rDNA-amplicon-based bacterial species identification were carried out according to our previously published methods [7]. Furthermore, bacterial isolates were identified as Staphylococcus capitis (kcgeb_sa; 100% identity and E-value = 0), Bacillus spizizenii (kcgeb_sc; 99.5% identity and E-value = 0), Pelagerythrobacter marensis (kcgeb_sd; 100% identity and E-value = 0), Priestia aryabhattai (kcgeb_S4; 98% identity and E value = 0) and Bacillus genus (kcgeb_S11; 97.53% identity and E value = 0) by amplifying and sequencing the complete 16S rRNA gene sequence (~ 1.5 kb) using the universal primers 27 F and 1492R.
For WGS, shotgun and long-read libraries were prepared as previously described [7] and sequenced on an Illumina NovaSeq 6000 (PE reads, 150 bp) and MinION, respectively. The genome sequencing read statistics generated for each isolate are summarized in Data file 2 (Table 1). Trimmomatic v.0.39 [8] was used to trim low-quality bases and adapters from the raw Illumina reads, whereas ONT-MinION reads were error corrected and trimmed using the CANU program [9]. A hybrid genome assembly was used to assemble whole genomes of bacteria using Unicycler pipeline [10]. The assembled genomes were polished with Illumina and ONT reads using Pilon v. 1.23 [11]. Plausible plasmid sequences were extracted from the genome assembly using a homology-based approach. In addition, the assembled sample species were confirmed based on the average nucleotide identity (ANI) method [12]. The gene predictions and annotations of the assembled genomes were performed using the Prokka/ NCBI-PGAP tools [13, 14].
Our hybrid assembly strategy produced a gap-free, high-quality single circular genome for all five bacterial isolates. The kcgeb_sa isolate identified as Staphylococcus capitis had a genome size of 2,471,401 bp (G + C: ~33.1%), a BUSCO score of 100% and 2484 genes including 2340 protein-coding, 63 tRNA, 22 rRNA, and 5 ncRNA genes and two plasmids of 47,919 bp and 3530 bp (Table 1, Data files 3, 4, 5, 6 and 7).
The isolate kcgeb_sc was identified as Bacillus spizizenii with a genome size of 4,130,445 bp and a G + C percentage of ~ 43.9%, a BUSCO score of 100% and 4179 gene models, including 3963 protein-coding, 86 tRNA, 30 rRNA, and 5 ncRNA genes (Table 1, Data files 8, 9 and 10).
The isolate kcgeb_sd was identified as Pelagerythrobacter marensis with a genome size of 2,902,066 bp (G + C: ~66.38%), a plasmid sequence (7769 bp), a BUSCO score of ~ 98.4% and 2774 genes, including 2728 protein-coding, 46 tRNA, 3 rRNA, and 3 ncRNA genes (Table 1, Data files 11, 12, 13 and 14).
The isolate kcgeb_S4 was identified as Priestia aryabhattai with a genome size of 5,052,464 bp (G + C: ~38%), a BUSCO score of ~ 93.5%, 5247 genes with 5056 protein-coding, 37 rRNA, 99 tRNA and 8 ncRNA genes (Table 1, Data files 15, 16 and 17).
The isolate kcgeb_S11 was identified as Bacillus spizizenii with a genome size of 4,130,172 bp (G + C: ~43.9%), a BUSCO score of 100% and 4178 genes with 3962 protein-coding, 86 tRNAs, 30 rRNAs, and 5 ncRNAs genes (Table 1, Date files 18, 19 and 20).
Table 1. Overview of the data files/datasetsLabelName of data file/datasetFile types (file extension)Data repository and identifier (DOI or accession number)Data file 1Data analysis workflow used for whole genome sequencing of bacterial isolatesPDFFigshare: 10.6084/m9.figshare.25816543.v1 [15]Data file 2Raw data (Illumina and MinION) detailsExcelFigshare: 10.6084/m9.figshare.25838296.v1 [16]Data file 3Staphylococcus capitis (kcgeb_sa) genome assembly and annotation statisticsExcelFigshare: 10.6084/m9.figshare.25975564.v1 [17]Data file 4NGS data for Staphylococcus capitis (kcgeb_sa)Web linkNCBI data:http://identifiers.org/insdc.sra:SRP378207 [18]Data file 5Genome sequence of Staphylococcus capitis (kcgeb_sa)Web linkNCBI data:http://identifiers.org/insdc:CP145595.1 [19]Data file 6Plasmid sequence of Staphylococcus capitis (kcgeb_sa)Web linkNCBI data:http://identifiers.org/insdc:CP145596.1 [20]Data file 7Plasmid sequence of Staphylococcus capitis (kcgeb_sa)Web linkNCBI data:http://identifiers.org/insdc:CP145597.1 [21]Data file 8Whole genome statistics of Bacillus spizizenii (kcgeb_sc)ExcelFigshare: 10.6084/m9.figshare.25557906.v1 [22]Data file 9NGS data of *Bacillus spizizenii (kcgeb_sc)*Web linkNCBI data:http://identifiers.org/insdc.sra:SRP377107 [23]Data file 10Genome sequence of *Bacillus spizizenii (kcgeb_sc)*Web linkNCBI data:http://identifiers.org/insdc:CP145137.1 [24]Data file 11Whole genome statistics of Pelagerythrobacter marensis (kcgeb_sd)ExcelFigshare: 10.6084/m9.figshare.25557891.v1 [25]Data file 12NGS data of Pelagerythrobacter marensis (kcgeb_sd)Web linkNCBI data:http://identifiers.org/insdc.sra:SRP377106 [26]Data file 13Genome sequence of Pelagerythrobacter marensis (kcgeb_sd)Web linkNCBI data:http://identifiers.org/insdc:CP144918.1 [27]Data file 14Plasmid sequence of Pelagerythrobacter marensis (kcgeb_sd)Web linkNCBI data:http://identifiers.org/insdc:CP144919.1 [28]Data file 15Whole genome statistics of Priestia aryabhattai (kcgeb_S4)ExcelFigshare: 10.6084/m9.figshare.25557897.v1 [29]Data file 16NGS data of Priestia aryabhattai (kcgeb_S4)Web linkNCBI data:http://identifiers.org/insdc.sra:SRP489214 [30]Data file 17Genome sequence of Priestia aryabhattai (kcgeb_S4)Web linkNCBI data:http://identifiers.org/insdc:CP145138.1 [31]Data file 18Whole genome statistics of Bacillus spizizenii (kcgeb_S11)ExcelFigsahre: 10.6084/m9.figshare.25557900.v1 [32]Data file 19NGS data of Bacillus spizizenii (kcgeb_S11)Web linkNCBI data:http://identifiers.org/insdc.sra:SRP489215 [33]Data file 20Genome sequence of Bacillus spizizenii (kcgeb_S11)NCBI data:http://identifiers.org/insdc:CP145722.1 [34]
Limitations
We used a hybrid genome assembly method with high-coverage WGS data (both long and short reads) to produce a gap-free, high-quality single circular genome from all the bacterial isolates. In addition, we used Illumina and ONT-MinION reads to error-correct and polish the assembled genomes, and the Benchmarking Universal Single-Copy Orthologs (BUSCO) v.4.1.4 [35] tool was used to assess the completeness of the final genome assemblies, which confirmed genome assembly completeness. As a result, the authors are unaware of any limitations in their genome assembly and annotation approaches.
Nevertheless, this data note focuses on the description and annotation of high-quality genomes of five bacteria isolated from the Abu Dhabi sabkha-shore region. More in-depth research is needed to understand the phylogenetics, gene functions, and metabolic pathways, as well as the distinct biosynthetic gene clusters associated with these bacterial isolates that allow them to survive in harsh environments.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Alnuaim A Alsanabani N Alshenawy A Monotonic and cyclic behavior of salt-encrusted flat (sabkha) soil Int J Civil Eng 2021191879810.1007/s 40999-020-00561-0 · doi ↗
- 2Alshenawy AO Hamid WM Alnuaim AMA review on the characteristics of sabkha soils in the Arabian Gulf Region Arab J Geosci 20211411510.1007/s 12517-021-08275-w · doi ↗
- 3Dong H Yu B Geomicrobiological processes in extreme environments: a review Episodes J Int Geoscience 200730320216
- 4Al Disi ZA Jaoua S Bontognali TR Attia ES Al-Kuwari HAAS Zouari N Evidence of a role for aerobic bacteria in high magnesium carbonate formation in the evaporitic environment of Dohat Faishakh Sabkha in Qatar Front Environ Sci 20175110.3389/fenvs.2017.00001 · doi ↗
- 5Edwards HG, Mohsin MA, Sadooni FN, Nik Hassan NF, Munshi TJA. Life in the sabkha: Raman spectroscopy of halotrophic extremophiles of relevance to planetary exploration. Chem b. 2006;385:46–56.10.1007/s 00216-006-0396-316607492 · doi ↗ · pubmed ↗
- 6Hazzouri KM Sudalaimuthuasari N Saeed EE Kundu B Al-Maskari RS Nelson D Al Shehhi AA Aldhuhoori MA Almutawa DS Alshehhi FR Salt flat microbial diversity and dynamics across salinity gradient Sci Rep 20221211129310.1038/s 41598-022-15347-835788147 PMC 9253026 · doi ↗ · pubmed ↗
- 7Salha Y, Sudalaimuthuasari N, Kundu B, Al Maskari RS, Alkaabi AS, Hazzouri KM, Abu Qamar SF, El-Tarabily KA, Amiri KM. Complete genome sequence of Phytobacter diazotrophicus strain UAEU 22, a plant growth-promoting bacterium isolated from the date palm rhizosphere. Microbiol Resource Announcements. 2020;9(25). 10.1128/mra. 00499 – 00420.10.1128/MRA.00499-20PMC 730341132554791 · doi ↗ · pubmed ↗
- 8Bolger AM Lohse M Usadel B Trimmomatic: a flexible trimmer for Illumina sequence data Bioinformatics 2014301521142010.1093/bioinformatics/btu 17024695404 PMC 4103590 · doi ↗ · pubmed ↗
