Draft genome sequence of Bacillus safensis 2T-2, isolated from drinking water
Cornelius C. Bezuidenhout, Lesego G. Molale-Tom, Rinaldo K. Kritzinger, Oluwaseyi Samuel Olanrewaju

TL;DR
This paper reports the genome sequence of a Bacillus safensis strain from drinking water, revealing antibiotic resistance and potential for producing useful compounds.
Contribution
First report of Bacillus safensis in treated drinking water with detailed genomic and biosynthetic analysis.
Findings
Genome has 3.78 Mb with 4000 coding sequences and six antibiotic resistance genes.
Strain shows low pathogenic potential but has 12 secondary metabolite gene clusters for antimicrobial and biosurfactant production.
Phylogenetic analysis confirms the strain as B. safensis, closely related to B. safensis F0–36b.
Abstract
Bacillus safensis 2T-2 was isolated from potable water at a municipal water treatment facility in the North West province of South Africa, representing the first report of this species in treated drinking water systems. Whole genome sequencing revealed a 3.78 Mb genome with 41.3 % GC content and 4000 coding sequences distributed across 126 contigs. Genome analysis identified six antibiotic resistance genes, including vancomycin resistance genes (vanT, vanY), fosfomycin resistance (fosBx1), chloramphenicol resistance (cat86), and two disinfectant resistance genes (qacG, qacJ). Despite the presence of resistance genes, PathogenFinder analysis confirmed low pathogenic potential (0.168 probability). The strain demonstrated significant biosynthetic capabilities with 12 secondary metabolite gene clusters, including antimicrobial compound production (plantazolicin), biosurfactants…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBacterial biofilms and quorum sensing · Bacillus and Francisella bacterial research · Genomics and Phylogenetic Studies
Specifications TableSubjectBiologySpecific subject areaMicrobiology, Microbial Genomics, Molecular BiologyType of dataRaw and analyzed whole genomic sequence data represented by Tables and FiguresData collectionBacterial strain was isolated from potable water on nutrient agar at 37 °C. Genomic DNA was extracted using chemagic DNA bacteria kit (PerkinElmer). DNA libraries were prepared using Nextera XT kit and sequenced on Illumina MiSeq 300 platform with paired-end reads (2 × 300 bp). Raw reads were quality-checked with FastQC, trimmed using Trimmomatic, and assembled using SPAdes. Genome annotation was performed using RAST pipeline, followed by identification of antibiotic resistance genes (CARD database), secondary metabolites (antiSMASH), mobile genetic elements (mobileOG-db), and phylogenetic analysis (TYGS).Data source locationUnit for Environmental Sciences and Management, North-West University, Potchefstroom Campus, Private Bag X6001, 2520, Potchefstroom, South AfricaData accessibilityThis Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBankunder the accession JBNJOE000000000. The version described in this paperis version JBNJOE010000000.Direct URL to data: https://www.ncbi.nlm.nih.gov/nuccore/JBNJOE000000000Bioproject: PRJNA1247746Biosample: SAMN47828673
Value of the Data
1
- •The data is valuable for understanding the safety of potable water by providing insight into the genome of B. safensis 2T-2 isolated from the water treatment plant.
- •The identification of antibiotic-resistant and virulence genes in B. safensis 2T-2 is important in epidemiological surveys and understanding the need for public safety.
- •The presence of six antibiotic resistance genes in bacteria surviving current water treatment processes highlights the need for enhanced strategies to prevent resistance gene dissemination through water distribution systems.
- •The high‐quality draft genome, supported by deep sequencing coverage (153x), high completeness (99.59 %), and low contamination (1.64 %), provides a robust framework for pan-genome and evolutionary analyses.
- •The detection of secondary-metabolite clusters offers a rich source for new bioactive compounds
Background
2
Bacillus safensis (B. safensis) is a Gram-positive, spore-forming bacterium recognized for its versatility and diverse metabolic capabilities [1]. B. safensis, originally isolated from spaceship cleanrooms [2], has been detected in diverse environmental niches [3,4]. This study represents the first report of B. safensis in treated drinking water. While extensive research has characterized B. safensis from spacecraft and terrestrial environments [5,6], its presence and genetic characteristics in water treatment systems remain unexplored. This knowledge gap is critical given the increasing concern over antibiotic resistance proliferation through water distribution networks and the potential biotechnological applications of stress-resistant bacteria in water treatment enhancement. The emergence of antibiotic-resistant bacteria in treated water systems poses significant public health risks, as these environments serve as reservoirs for resistance gene dissemination [7,8]. Genomic characterization of bacteria surviving water treatment processes is essential for understanding treatment efficacy limitations and developing enhanced disinfection strategies. Its resilient physiology enables survival and function in adverse settings, making it a viable choice for environmental and biotechnological applications. Genome mining has identified genes in the production of antimicrobial chemicals, biosurfactants, and enzymes relevant to pollutant breakdown [9,10].
This dataset focuses on a B. safensis strain obtained from treated drinking water, providing extensive genomic and functional annotations relevant to water treatment and purification. This includes information on biosurfactant generation, which aids in the remediation of organic contaminants. The data are pertinent for researchers examining biological methods for sustainable water management, as B. safensis and similar Bacillus species have shown the capacity to enhance the efficacy of biological treatment procedures and support water reuse initiatives [3,11]. This genomic analysis aims to: (i) characterize the antibiotic resistance profile of B. safensis in treated water, (ii) identify secondary metabolite production capabilities relevant to water treatment applications, and (iii) provide baseline genomic data for monitoring microbial communities surviving current treatment protocols.
Data Description
3
We present the whole genome sequence of B. safensis 2T-2. We conducted a comprehensive genome mining approach to identify mobile genetic elements, phages, antibiotic-resistant genes, metabolic pathways, and the presence of CRISPR genes. Additionally, we performed phylogenomic analysis to determine evolutionary relationships. The circular representation of the genome with the annotation by RAST is presented in Fig. 1. The assembled genome spans 3.78 Mb, distributed across 126 contigs with a GC content of 41.3 % (Table 1). The genome is 99.59 % complete with low contamination of 1.64 % (Table 1).Fig. 1. Circular visualization of B. safensis 2T-2 genome generated using Proksee. Concentric tracks from outermost to innermost show: (1) mobile genetic elements on forward strand, (2) forward strand coding sequences (blue), (3) contig boundaries, (4) reverse strand coding sequences, (5) GC content variation (black line), (6) GC skew analysis (green/purple), and (7) mobile genetic elements on reverse strand. The 3.78 Mb genome is distributed across 126 contigs with 41.3 % GC content.Fig 1. Table 1Genome assembly and annotation statistics for B. safensis 2T-2 isolated from potable water.Table 1. ParameterValueBioproject no.PRJNA1247746Biosample no.SAMN47828673GenBank accession no.JBNJOE000000000SRA accession no.SRR33016476Genome size3785,440 bpGenome coverage153 xCompleteness99.59 %Contamination1.64 %No. of contigs126N50230,017 bpL505No. of coding sequences4000No. of RNAs80GC content41.3 %No. of subsystems327
Whole-genome taxonomic analysis using GTDB confirmed the bacterium’s identity as Bacillus safensis. While B. safensis F0–36b was the closest neighbor in a whole genome phylogenetic tree (Fig. 2A) and further confirmed by the average nucleotide analysis (Fig. 2B).Fig. 2A. Phylogenetic analysis of B. safensis 2T-2. The tree shows relationships between 2T-2 and closely related Bacillus species, with B. safensis F0–36b as the nearest neighbor. Ultrafast bootstrap support values (1000 replicates) are indicated by node circles, with circle size proportional to support value as shown in the legend. The tree was rerooted using Paenibacillus polymyxa ATCC 842 as an outgroup. Isolate 2T-2 is highlighted in red. Visualization was performed in iTOL v7 (https://itol.embl.de/). B. Heatmap of Average Nucleotide Identity (ANI) values of representative Bacillus species used in the phylogenetic tree.Fig 2
Antibiotic resistance genes prediction
3.1
PathogenFinder analysis indicated low pathogenic potential for B. safensis 2T-2 (probability score: 0.168). CARD database analysis revealed six antibiotic resistance genes within the genome. These included vancomycin resistance genes (vanT and vanY) from the vanG cluster, fosfomycin resistance gene (fosBx1), and two disinfectant resistance genes (qacG and qacJ) (Table 2).Table 2. Antibiotic resistance genes identified in the B. safensis 2T-2 genome using the Comprehensive Antibiotic Resistance Database (CARD) with the RGI tool. Genes are categorized by resistance mechanism, target antibiotic class, and gene family according to CARD classification criteria.Table 2RGI CriteriaARO TermAMR Gene FamilyDrugResistance MechanismStrictvanT gene in vanG clusterglycopeptide resistance gene cluster, vanTglycopeptide antibioticantibiotic target alterationStrictBacillus pumilus cat86chloramphenicol acetyltransferase (CAT)phenicol antibioticantibiotic inactivationStrictqacJsmall multidrug resistance (SMR) antibiotic efflux pumpdisinfecting agents and antisepticsantibiotic effluxStrictqacGsmall multidrug resistance (SMR) antibiotic efflux pumpdisinfecting agents and antisepticsantibiotic effluxStrictFosBx1fosfomycin thiol transferasephosphonic acid antibioticantibiotic inactivationStrictvanY gene in vanG clustervanY, glycopeptide resistance gene clusterglycopeptide antibioticantibiotic target alteration
Secondary metabolites and biosynthetic gene cluster identification
3.2
B. safensis 2T-2 harbored 12 biosynthetic gene clusters, including genes encoding siderophore schizokinen synthesis, a RiPP gene cluster for the antibiotic plantazolicin, and three NRPS gene clusters responsible for producing the siderophore bacillibactin, biosurfactant lichenysin, and lipopeptide fengycin (Fig. 3). BAGEL4 identified five gene clusters of interest related to the synthesis of secondary metabolites, including 3 core peptide genes (UviB, plantazolicin, and pumilarin), 5 modification genes, and 1 immunity and transport gene (Fig. 4).Fig. 3. Secondary metabolite biosynthetic gene clusters identified in B. safensis 2T-2 using antiSMASH analysis. The genome contains 12 clusters, including NRPS genes for siderophore production (bacillibactin), biosurfactant synthesis (lichenysin), lipopeptide production (fengycin), RiPP genes for antibiotic synthesis (plantazolicin), and siderophore schizokinen production. Each colored region represents a different biosynthetic gene cluster, with the predicted secondary metabolite type indicated.Fig 3. Fig. 4Bacteriocin gene clusters identified in B. safensis 2T-2 using BAGEL4 analysis. Five gene clusters were detected, containing 3 core peptide genes (UviB, plantazolicin, and pumilarin), 5 modification genes involved in post-translational processing, and 1 immunity and transport gene for self-protection and secretion. Gene arrangements and functional annotations are shown with directional arrows indicating transcription orientation.Fig 4
Experimental Design, Materials, and Methods
4
The B. safensis 2T-2 was isolated from potable water at a municipal water treatment facility in the North West Province of South Africa. Strain was isolated on nutrient agar at 37 °C for 24 h. Pure isolate was obtained after further subculturing on nutrient agar at 37 °C for 24 h. DNA from pure culture was extracted using the chemagic DNA bacteria kit (PerkinElmer, Germany), following the manufacturer’s protocol. The DNA quantity and quality were assessed using a NanoDrop 2000 UV–Visible spectrophotometer (Thermo Fisher Scientific). DNA libraries were prepared using the Nextera XT kit (Illumina Inc., San Diego, CA) according to the instructions provided by the manufacturer and sequenced using the Illumina MiSeq 300 platform with paired-end reads at the North-West University sequencing facility. The generated raw paired-end fastq reads (2 × 300 bp) were quality checked using FastQC v.0.11.7 [12]; low-quality bases were trimmed using Trimmomatic v.0.39 [13], and the data quality was rechecked using FastQC v.0.11.7 [12]. The cleaned reads were assembled using SPAdes v.3.15.5 [14]. QUAST v.5.0.2 [15] was used to evaluate the quality of the genome assembly. The completeness and contamination were assessed using CheckM v.1.1.6 [16]. The assembled draft genome was annotated using the Rapid Annotations using Subsystems Technology (RAST) pipeline [17]. The annotated genomes were assessed against the Genome Taxonomy Database (GTDB) [18] using GTDB-Tk software v.1.7.0 [19]. The circular plot of the genome was created using proksee [20] using the RAST annotated file as input. For phylogenetic analysis, the draft genome of isolate 2T-2, along with reference Bacillus genomes and Paenibacillus polymyxa (used as the outgroup), was analyzed using GTDB-Tk v2.3.2 [19] to generate a concatenated alignment of 120 bacterial single-copy marker genes. The alignment was subsequently used to infer a maximum-likelihood tree in IQ-TREE v2.4.0 [21] with ModelFinder Plus (-m MFP) and 1000 ultrafast bootstrap replicates (-B 1000). The resulting consensus tree was rerooted on P. polymyxa using Gotree v0.4.2 [22]. Antibiotic resistance genes were identified using the RGI tool on the CARD database [23], phages were detected with the phastest tool [11], CRISPR-Cas systems were detected using the CRISPRFinder tool [24], secondary metabolites were detected using the online platform Antibiotics and Secondary Metabolites Analysis Shell (antiSMASH) v.8.0 [25], bacteriocins were detected using the web-based version of Bagel4 [26], mobile genetic elements were identified using the mobileOG-db tool [27], and pathogenicity of the genome was identified using PathogenFinder v.1.1 [28] with the parameters “firmicutes and Assembled Genome/Contig”. Default parameters were used for all software and tools except where otherwise noted.
Limitations
Not applicable
Ethics Statement
The authors have read and followed the ethical requirements for publication in Data in Brief and confirm that the current work does not involve human subjects, animal experiments, or any data collected from social media platforms.
CRediT Author Statement
Oluwaseyi Samuel Olanrewaju: Conceptualization, Data curation, Formal analysis, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing. Lesego G. Molale-Tom: Conceptualization, Investigation, Project administration, Supervision, Writing – review & editing. Rinaldo K. Kritzinger: Conceptualization, Investigation, Methodology. Cornelius C. Bezuidenhout: Conceptualization, Funding acquisition, Resources, Supervision, Writing – review & editing.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Lateef A.Adelere I.A.Gueguim-Kana E.B.The biology and potential biotechnological applications of Bacillus safensis Biologia (Bratisl)704201541141910.1080/13102818.2014.986360 PMC 468406826740788 · doi ↗ · pubmed ↗
- 2Satomi M.La Duc M.T.Venkateswaran K.Bacillus safensis sp. nov., isolated from spacecraft and assembly-facility surfaces Int. J. Syst. Evol. Microbiol.5682006173517401690200010.1099/ijs.0.64189-0 · doi ↗ · pubmed ↗
- 3Harboul K.El Aabedy A.Hammani K.El-Karkouri A.Reduction of hexavalent chromium using Bacillus safensis isolated from an abandoned mine Environ. Technol.45222024449545113767165910.1080/09593330.2023.2256457 · doi ↗ · pubmed ↗
- 4Boby F.Bhuiyan M.N.H.Khan M.S.Parvez M.M.Draft genome sequence data on Bacillus safensis FB 03 isolated from the rhizosphere soil of leguminous plant in Bangladesh Data Brief 60202511152710.1016/j.dib.2025.111527 PMC 1200521840248511 · doi ↗ · pubmed ↗
- 5Golnari M.Bahrami N.Milanian Z.Rabbani Khorasgani M.Asadollahi M.A.Shafiei R.Fatemi S.S.-A.Isolation and characterization of novel Bacillus strains with superior probiotic potential: comparative analysis and safety evaluation Sci. Rep.1412024145710.1038/s 41598-024-51823-z 38228716 PMC 10791968 · doi ↗ · pubmed ↗
- 6Hameed A.Mc Donagh F.Sengupta P.Miliotis G.Sivabalan S.K.M.Szydlowski L.Simpson A.Singh N.K.Rekha P.D.Raman K.Venkateswaran K.Neobacillus driksii sp. nov. Isolated from a Mars 2020 spacecraft assembly facility and genomic potential for lasso peptide production in Neobacillus Microbiol. Spectr.131202510.1128/spectrum.01376-24e 01376-24PMC 1170595339611829 · doi ↗ · pubmed ↗
- 7Olanrewaju O.S.Molale-Tom L.G.Kritzinger R.K.Bezuidenhout C.C.Genome mining of Escherichia coli WG 5D from drinking water source: unraveling antibiotic resistance genes, virulence factors, and pathogenicity Bmc Genomics [Electronic Resource]251202426310.1186/s 12864-024-10110-x 38459466 PMC 10924361 · doi ↗ · pubmed ↗
- 8Cornelius C.B.Molale-Tom Lesego G.Kritzinger Rinaldo K.Oluwaseyi S.O.Draft genome sequences of two Bacillus bombysepticus strains from drinking water Microbiol. Resour. Announc.127202310.1128/mra.00434-23e 00434-23PMC 1035340737358449 · doi ↗ · pubmed ↗
