Complete genome sequences of the Paenibacillus kyungheensis KACC 18744T, Sphingomonas naphthae KACC 18716T, and Novosphingobium humi KACC 19094T
Hyorim Choi, Seunghwan Kim, Miyoung Won, Yunhee Choi, Yonghoon Lee, Yiseul Kim, Jun Heo

TL;DR
This paper presents the complete genome sequences of three bacterial species found in Korea to study their genomic diversity.
Contribution
The novelty lies in providing complete genome sequences of three Korean bacterial type strains for genomic diversity analysis.
Findings
The whole genome sequence of Paenibacillus kyungheensis KACC 18744T was determined.
Genome sequences of Sphingomonas naphthae KACC 18716T and Novosphingobium humi KACC 19094T were reported.
The study contributes to understanding the genomic diversity of Korean bacterial type strains.
Abstract
We report the whole genome sequences of Paenibacillus kyungheensis KACC 18744T, Sphingomonas naphthae KACC 18716T, and Novosphingobium humi KACC 19094T, to investigate the genomic diversity of bacterial type strains distributed in Korea.
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
| Strain | KACC 18744T | KACC 18716T | KACC 19094T |
|---|---|---|---|
| Species |
|
|
|
| BioProject accession no. |
|
|
|
| BioSample accession no. |
|
|
|
| GenBank assembly accession no. |
|
|
|
| GenBank accession no. |
|
|
|
| SRA accession no. | |||
| Illumina |
|
|
|
| Pacbio |
|
|
|
| HiFi reads | |||
| Total length (bp) | 1,227,207,177 | 1,714,875,056 | 1,475,901,199 |
| Total no. of reads | 136,392 | 190,527 | 171,360 |
| | 9,721 | 9,666 | 9,271 |
| Mean quality | Q35 | Q32 | Q33 |
| Illumina | |||
| Total length (bp) | 3,355,346,840 | 3,219,202,522 | 3,158,184,630 |
| Paired length (bp) | 2,623,250,643 | 2,342,773,538 | 2,402,495,342 |
| Filtered no. of reads | 17,380,330 | 15,521,426 | 15,915,742 |
| Q20 (%) | 99.36 | 99.37 | 99.35 |
| No. of contigs | 1 (circular) | 5 (circular) | 4 (circular) |
| Total length (bp) | 5,258,865 | 4,309,746 | 4,890,308 |
| Corrected total length | 5,260,882 | 4,310,851 | 4,890,578 |
| G+C content (%) | 39.3 | 67.3 | 63.6 |
| Sequencing depth | 233.0× | 397.2× | 301.4× |
| BUCO completion (%) | 99.19 | 100 | 99.19 |
| Chromosome length (bp) | 5,260,882 | 3,919,827 | 3,439,207 |
| Annotation results | |||
| No. of genes | 4,586 | 4,199 | 4,435 |
| No. of CDS | 4,480 | 4,143 | 4,368 |
| No. of RNA genes | 106 | 56 | 67 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Probiotics and Fermented Foods · Bacteriophages and microbial interactions
ANNOUNCEMENT
The genomic information of type strains plays a crucial role in bacterial phylogenetics, functional gene analysis, and comparative genomic analyses. Despite its importance, genome analysis was not mandatory for the description of novel bacterial species prior to 2018 (1). Many type strains reported before this time still lack sufficient genomic data. To address this gap, we conducted genome sequencing of type strains preserved at the Korean Agricultural Culture Collection (KACC). Specifically, our study focused on three type strains (Paenibacillus kyungheensis KACC 18744^T^, Sphingomonas naphthae KACC 18716^T^, and Novosphingobium humi KACC 19094^T^) isolated and reported in Korea between 2015 and 2017 (2–4), but genomic data for these strains remained unavailable until now.
These strains were cultured on Reasoner’s 2A medium (BD Difco, NJ, USA) with pH 6.0 at 28°C for 3 days under aerobic condition. According to the manufacturer’s protocol, the Qiagen MagAttract HMW DNA kit (Qiagen, Hilden, Germany) was used for genomic DNA extraction. The DNA products generated from the previous procedure were used for genome sequence analysis on an Illumina MiSeq (Illumina, CA, USA) and PacBio Sequel IIe (Pacific Biosciences, CA, USA). Genomic DNA libraries were created using the TruSeq Nano DNA High Throughput Library Prep kit (Illumina, CA, USA). A total of 10 µL libraries were prepared as 7–12kb size templates for the PacBio SMRTbell prep kit 3.0. Then, they were analyzed using the Sequel II Bind Kit 3.2 and Int Ctrl 3.2. Sequencing was carried out using Sequel II Sequencing Kit 2.0 and SMRT cell 8M trays. HIFI reads were obtained from the PacBio Sequel IIe system and assembled using the microbial assembly application in SMRT link 11.0.0.146107 software with default parameters, based on the Hierarchical Genome Assembly Process (5). HIFI reads were generated with quality value 20 or 99% predicted accuracy.
The Illumina raw reads of which 90% of the bases had a phred score of 30 or higher were filtered, and adapter trimming was performed using Trimmomatic 0.38 (6). Then, the assembly, initially constructed with long-read data, was revised through Pilon v1.21 to correct errors and enhance precision (7). Successfully circularized were confirmed by microbial genome analysis application in the SMRTlink 11.0.0.146107 (8), and completeness were checked by BUSCO v5 (9). Gene prediction and annotation were conducted by the NCBI Prokaryotic Genome Annotation Pipeline v6.7 (PGAP) (10). All tools were executed with default parameters unless stated otherwise.
The genome of P. kyungheensis KACC 18744^T^ consists of a single circular chromosome (5,260,882 bp with GC content of 39.5%). The genome of S. naphthae KACC 18716^T^ has a single circular chromosome (4,310,851 bp with GC content of 67.5%) and four other contigs. The genome of N. humi KACC 19094^T^ has a single circular chromosome (4,890,578 bp with GC content of 63.5%) and three other contigs. Additional details and annotation results for the three genomes are shown in Table 1.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Chun J, Oren A, Ventosa A, Christensen H, Arahal DR, da Costa MS, Rooney AP, Yi H, Xu X-W, De Meyer S, Trujillo ME. 2018. Proposed minimal standards for the use of genome data for the taxonomy of prokaryotes. Int J Syst Evol Microbiol 68:461–466. doi:10.1099/ijsem.0.00251629292687 · doi ↗ · pubmed ↗
- 2Siddiqi MZ, Siddiqi MH, Im WT, Kim YJ, Yang DC. 2015. Paenibacillus kyungheensis sp. nov., isolated from flowers of magnolia. Int J Syst Evol Microbiol 65:3959–3964. doi:10.1099/ijsem.0.00052126268929 · doi ↗ · pubmed ↗
- 3Chaudhary DK, Kim J. 2016. Sphingomonas naphthae sp. nov., isolated from oil-contaminated soil. Int J Syst Evol Microbiol 66:4621–4627. doi:10.1099/ijsem.0.00140027506439 · doi ↗ · pubmed ↗
- 4Hyeon JW, Kim K, Son AR, Choi E, Lee SK, Jeon CO. 2017. Novosphingobium humi sp. nov., isolated from soil of a military shooting range. Int J Syst Evol Microbiol 67:3083–3088. doi:10.1099/ijsem.0.00208928829033 · doi ↗ · pubmed ↗
- 5Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler EE, Turner SW, Korlach J. 2013. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10:563–569. doi:10.1038/nmeth.247423644548 · doi ↗ · pubmed ↗
- 6Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. doi:10.1093/bioinformatics/btu 17024695404 PMC 4103590 · doi ↗ · pubmed ↗
- 7Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, Earl AM. 2014. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. P Lo S One 9:e 112963. doi:10.1371/journal.pone.011296325409509 PMC 4237348 · doi ↗ · pubmed ↗
- 8Hunt M, Silva ND, Otto TD, Parkhill J, Keane JA, Harris SR. 2015. Circlator: automated circularization of genome assemblies using long sequencing reads. Genome Biol 16:294. doi:10.1186/s 13059-015-0849-026714481 PMC 4699355 · doi ↗ · pubmed ↗
