Draft genome sequences of 13 putatively novel Haemophilus species and strains assembled from human saliva
Daniel Saito, Cristiane Pereira Borges Saito, Fabiana de Souza Cannavan, Siu Mui Tsai

TL;DR
This paper presents draft genome sequences of 13 Haemophilus species and strains found in human saliva, some of which may be new to science.
Contribution
The study identifies two potentially novel Haemophilus species and 11 strains using metagenomic analysis of human saliva.
Findings
Draft genomes of 13 Haemophilus representatives were reconstructed from human saliva samples.
Two potential new Haemophilus species and 11 strains were identified using ANI analysis.
Abstract
We present the draft metagenome-assembled genomes (MAGs) of 13 Haemophilus representatives from human saliva. MAGs were reconstructed by a streamlined pre-assembly mapping approach performed against 9 clinically relevant reference genomes. Overall, genomes belonging to 2 potentially novel Haemophilus species and 11 strains were recovered, as determined by genome-wide ANI analysis.
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
- —Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
- —Fundação de Amparo à Pesquisa do Estado do Amazonas (FAPEAM)
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMicrobial infections and disease research · Genomics and Phylogenetic Studies · Bacterial Infections and Vaccines
ANNOUNCEMENT
Haemophilus are small gram-negative cocobacilli that inhabit the oral cavity and upper respiratory tract of humans (1, 2). Under dysbiotic conditions, Haemophilus spp. may engender localized and systemic infections, including otitis media, sinusitis, conjunctivitis, pneumonia, chancroid, meningitis, and bacteremia (2). Members of this genus are generally fastidious and difficult to culture in the laboratory (2); hence, molecular-based investigations can help shed additional light into their virulence and pathophysiology. In this study, 13 Haemophilus metagenome-assembled genomes (MAGs) were recovered from non-stimulated saliva of healthy and oral disease-associated subjects.
Twenty-seven volunteers were attended at the Dental Clinic of Amazonas State University (Brazil) with no distinction to gender, age, or ethnicity. All participants signed an informed consent complying to the seventh version of the Declaration of Helsinki (2013). Non-stimulated saliva samples were collected and submitted to total DNA extraction. Metagenomic DNA was hydrodynamically sonicated, and 1.0 µg DNA from each sample was used for preparation of sample-specific libraries with NEBNext Ultra DNA Library Prep Kit. Products in the range of 300 bp were selected and sequenced via Illumina HiSeq 2500 platform. Paired-end reads were merged and adapter sequences excised with PEAR v.0.9.8, while host-related sequences were removed by mapping to the GRCh38.p14 human genome data set (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.40, accessed on 02 June 2023) with Bowtie2 (3), using the “–un-conc” and “--very-sensitive-local” parameters. Species-level mapping of reads was performed with Bowtie2 (3) using the “--fast-local” parameter against NCBI’s reference genomes of H. aegyptius, H. ducreyi, H. haemolyticus, H. influenzae, H. parainfluenzae, H. parahaemolyticus, H. paraphrohaemolyticus, H. pittmaniae, and H. sputorum. Assembly of contigs and binning of MAGs were achieved with SPADES v.3.15.5 (4) and Maxbin2, respectively. Completeness (50% minimum) and contamination (10% maximum) values were assessed with CheckM v.1.10.18 (5). Taxonomic placement of MAGs was achieved with GTDB-Tk v1.7.0 (6) of Kbase (7), adopting ANI scores of 95% for species-level definition and between 95% and 97% for strain-level demarcation (8). General MAG annotation was performed with NCBI’s PGAP v.4.11. Search for antimicrobial resistance determinants was performed with CARD 2023 (9) and annotation of carbohydrate-active enzymes, with dbCAN3 (10).
In all, 113 MAGs were binned from a total of 27 clinical samples. Of these, 13 MAGs fully complied to the quality and taxonomic threshold parameters and were, therefore, further selected for taxonomic inference and gene annotation procedures. These MAGs were recovered from 11 distinct saliva specimens, with sample OHS0020_HPI being related to a chronic periodontitis case, and the remainder to healthy subjects. According to GTDB-tk analysis, all MAGs were placed within the Haemophilus genus, 11 of which corresponding to previously unreported strains of H. haemolyticus, H. parahaemolyticus, and H. parainfluenzae. In addition, MAGs OH0009_HAE and OH0010_HAE displayed ANI values <95% with those available in GTDB, suggesting them as putatively novel genomes closely related to H. parainfluenzae. These were deposited in Genbank under names Haemophilus bacterium OH0009 and OH0010, respectively. General taxonomic and annotation features are depicted in Table 1.
TABLE 1: General information and annotation results of 13 metagenome-assembled genomes belonging to the Haemophilus genus retrieved from non-stimulated saliva of 11 individuals
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Yamashita Y, Takeshita T. 2017. The oral microbiome and human health. J Oral Sci 59:201–206. doi:10.2334/josnusd.16-085628637979 · doi ↗ · pubmed ↗
- 2Nørskov-Lauritsen N. 2014. Classification, identification, and clinical significance of Haemophilus and Aggregatibacter species with host specificity for humans. Clin Microbiol Rev 27:214–240. doi:10.1128/CMR.00103-1324696434 PMC 3993099 · doi ↗ · pubmed ↗
- 3Langmead B, Trapnell C, Pop M, Salzberg SL. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R 25. doi:10.1186/gb-2009-10-3-r 2519261174 PMC 2690996 · doi ↗ · pubmed ↗
- 4Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. 2012. SP Ades: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477. doi:10.1089/cmb.2012.002122506599 PMC 3342519 · doi ↗ · pubmed ↗
- 5Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2015. Check M: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055. doi:10.1101/gr.186072.11425977477 PMC 4484387 · doi ↗ · pubmed ↗
- 6Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. 2020. GTDB-TK: a toolkit to classify genomes with the genome taxonomy database. Bioinformatics 36:1925–1927. doi:10.1093/bioinformatics/btz 848PMC 770375931730192 · doi ↗ · pubmed ↗
- 7Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, Dehal P, Ware D, Perez F, Canon S, et al.. 2018. Kbase: the United States department of energy systems biology knowledgebase. Nat Biotechnol 36:566–569. doi:10.1038/nbt.416329979655 PMC 6870991 · doi ↗ · pubmed ↗
- 8Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. 2018. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun 9:5114. doi:10.1038/s 41467-018-07641-930504855 PMC 6269478 · doi ↗ · pubmed ↗
