A cross-environment comparison of nontuberculous mycobacterial diversity
Matthew J. Gebert, Ettie M. Lipner, Jordan M. Galletta, Jessica B. Henley, Michael Hoffert, Melissa L. Riskin, D. Rebecca Prevots, Noah Fierer

TL;DR
This study compares the diversity and abundance of nontuberculous mycobacteria (NTM) in different environments to identify which pose the greatest risk for human infection.
Contribution
The study reveals that household plumbing biofilms are the most significant reservoir for clinically relevant NTM species.
Findings
Household plumbing biofilms had the highest relative abundance of the genus Mycobacterium (13.7% on average).
Mycobacterium avium and Mycobacterium abscessus were most frequently detected in household plumbing biofilms.
NTM diversity varies significantly between environments, with clinically relevant species largely restricted to household plumbing.
Abstract
Nontuberculous mycobacteria (NTM) are a group of environmental bacteria that encompass nearly 200 described species, some of which can cause chronic pulmonary and extrapulmonary infections in humans. What makes these infections unique is that they are environmentally acquired, yet there remains a limited understanding of how different environments contribute to potential pathogen exposure. Here, we use new and existing marker gene data sets to compare the amounts and types of NTM across three environments known to harbor mycobacteria, surface waters, soil, and household plumbing biofilms, to better understand potential pathogen occurrence in each environment. We used 16S rRNA gene sequencing, in tandem with mycobacterial-specific marker gene sequencing, to characterize variation in the relative abundances of the genus Mycobacterium and specific mycobacterial taxa across the three…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Fig 1
Fig 2
Fig 3
Fig 4
Fig 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMycobacterium research and diagnosis · Tuberculosis Research and Epidemiology · Immunodeficiency and Autoimmune Disorders
INTRODUCTION
Nontuberculous mycobacteria (NTM) are a group of bacteria commonly found in a wide range of environments. They are defined, in part, by their unique cell wall structure, consisting of long-chain fatty acids known as mycolic acids, that allow for their prolonged survival during periods of desiccation and nutrient limitation, as well as resistance to common disinfectants and antibiotics (1). Although most members of the genus are of limited clinical significance, some species (e.g. Mycobacterium avium complex, Mycobacterium abscessus, and Mycobacterium kansasii) can cause pulmonary and extrapulmonary infections in humans, especially in individuals with underlying structural lung damage (2). NTM disease is of growing concern worldwide, with case numbers increasing annually across the United States and around the world (3–5).
NTM disease is widely considered to be environmentally acquired, with many environments implicated in the exposure and transmission of pathogenic NTM (6–8). While NTM have been detected in a wide range of environments, including soil, household plumbing, hot tubs and pools, and natural water bodies (9), few studies have comprehensively characterized NTM communities across a range of environments or quantified the occurrence of clinically relevant NTM taxa within different types of environments. This knowledge gap persists, in part, because of the widespread reliance on cultivation-dependent approaches for assessing NTM diversity in environmental samples, and the biases and limitations associated with such approaches (10–14). Many environmental NTM remain undescribed or poorly characterized (15), and differentiating clinically relevant NTM from those that are not clinically relevant requires time-consuming genotyping of isolates (16, 17). Likewise, previous work characterizing the presence of clinically relevant NTM in different environments has often focused on relatively small numbers of representative samples (18–20), making it challenging to reach broader conclusions about the potential importance of different environments as sources of NTM infections. In short, much of our knowledge on which environments most likely harbor clinically relevant NTM is derived from limited or anecdotal evidence.
Previous studies have used cultivation-independent approaches to survey NTM diversity in two environments often implicated as reservoirs of clinically relevant NTM: premise plumbing and surface soils. More specifically, Gebert et al. analyzed showerhead biofilms collected from 638 homes across the United States (7) and Walsh et al. analyzed 143 soils from across the globe (15), using a combination of 16S rRNA and hsp65 gene sequencing of DNA extracted directly from the environmental samples to characterize NTM diversity and composition. Both survey efforts found that the NTM communities found in these environments are diverse and highly variable in composition with clinically relevant NTM frequently detected in building premise plumbing. Surface waters, including lakes, streams, and reservoirs, are also implicated as sources of NTM exposures and are known to harbor a broad diversity of NTM, including clinically relevant taxa (20–24). However, it remains unclear how NTM communities vary across surface waters and whether surface waters represent an important source of exposures to clinically relevant NTM. More generally, it remains unclear how NTM diversity differs across these environments and which environments represent the most important potential sources of NTM exposures.
For this study, we used high-throughput marker gene sequencing, targeting both the 16S rRNA gene and the mycobacterial-specific 65-kD heat shock protein (hsp65) gene to characterize the ubiquity and types of NTM detected across 373 surface water samples collected from 102 sites across the US. We then integrated the surface water data sets with previously published data sets from 143 soils (15) and 638 household plumbing biofilm samples (7) to directly compare the diversity and ubiquity of NTM found across these three major environment types. The comprehensive survey of NTM diversity in surface waters conducted here, when coupled with the soil and premise plumbing data sets, made it possible to quantitatively compare NTM communities across environments and identify which environments are most likely to harbor clinically relevant NTM.
MATERIALS AND METHODS
Surface water samples
A total of 373 surface water samples were collected from surface water sites across the United States that are monitored by the US Geological Survey (USGS) as part of the National Water Quality Network (NWQN) (25). These samples were collected from 102 unique surface water collection sites across 28 states (for site map, see Fig. 1). At 73 of the sites, multiple samples were collected over time starting in July 2022 and concluding in November 2022. The collection sites spanned eight different surface water types, with the majority coming from streams, but also including large inland and coastal rivers, lakes, and reservoirs (see Table S1 for a complete list of site types). For the 16S rRNA gene sequencing analysis, all samples (N = 373) were included in downstream analysis.
Map showing the distribution of surface water sample collection sites, colored in purple, across the United States (N = 102), as part of the National Water Quality Network (USGS). For information on collection locations for the soil and household plumbing biofilm samples (see Walsh et al. [15] and Gebert et al. [7]). The map was created in R using the usmap package.
Methods of sample collection used by the NWQN conform to the USGS National Field Manual for the Collection of Water-Quality Data (26). To the greatest extent possible, isokinetic, depth-integrated sampling techniques that provide samples representative of stream conditions are used. The collection of isokinetic, depth-integrated samples is done using either an equal-width-increment or equal-discharge-increment sampling method that yields a composite sample that represents the streamflow-weighted concentrations of the stream cross section being sampled. For each sample collection, a total of 0.2–10 L of water was filtered immediately through 0.45 µm GWV high-capacity groundwater sampling capsule filters (Cytiva Life Sciences) by USGS personnel. Collection filters were then stored at −20°C at individual sites until the final date of collection (November 2022). All capsule filters were shipped overnight on ice to the University of Colorado Boulder, where they were stored at −20°C until DNA extraction.
Metadata associated with all samples were downloaded from the Water Quality Portal (https://www.waterqualitydata.us/) and included information on water temperature (°C), water pH, dissolved oxygen (% saturation), total carbon (mg/L), total nitrogen (mg/L), sodium adsorption ratio, and turbidity at the time of sampling (Table S2).
Household plumbing biofilm samples
We re-analyzed a previously published data set from a large-scale survey of showerhead biofilms, treating the showerhead biofilms as a proxy for household plumbing biofilms (7). For this analysis, we focused only on those biofilm swab samples collected from households in the United States (N = 638, 49 out of 50 states in the United States represented). For specific site information and sampling details (see Gebert et al. [7] and Fig. S1).
Global soil samples
We also re-analyzed a previously published soil data set from Walsh et al. (15), which focused on a subset of 143 soils from collection sites across six continents, originally collected for the study described in Delgado-Baquerizo et al. (27). Composite samples were collected from each site to a depth of 7.5 cm with all samples stored at −20°C immediately after collection. For additional soil and site information, (see Walsh et al. and Delgado-Baquerizo et al. [15, 27] and Fig. S1).
DNA extraction
DNA was extracted from the showerhead biofilm and soil samples using the DNEasy PowerSoil kit (Qiagen, Germantown, MD, USA) (see Gebert et al. [7] and Walsh et al. [15] for further details). To extract DNA from the surface water samples, filters were first thawed at room temperature for ~1 h prior to downstream processing. Autoclaved 7/16 in. brass caps were attached to the bottom of each filter capsule prior to thawing. Immediately before shaking, 5 mL of 5% Triton was added to each filter capsule, and the filter was capped on the top end, sealing the capsule. Capsule filters were shaken at 20 Hz for 7.5 min, then inverted and shaken for another 7.5 min. We then used an air-filled 50 mL syringe to force liquid out of each capsule in the direction of flow, collecting 2–5 mL of eluent per capsule into a DNA-free 15 mL conical tube, with the eluents frozen at −20°C immediately after collection. After thawing the eluents at room temperature, we extracted DNA using the DNeasy PowerWater kit (Qiagen), following the manufacturer’s recommended protocol. In each round of extraction, we included an extraction blank to check for contamination.
16S rRNA gene sequencing
We amplified and sequenced a portion of the 16S rRNA gene using bacterial and archaeal primers to determine the relative abundance of the genus Mycobacterium in each sample. For a detailed description of library preparation and sequencing for the soil and household plumbing biofilm samples (see Walsh et al. [15] and Gebert et al. [7]). To characterize the bacterial community of the extracted surface water, we used the same barcoded 515fBC/806rB primers (28) as aforementioned, which target the V4 region of the 16S rRNA gene. Duplicate PCR reactions were performed as follows: 12.5 µL of Colorless Platinum II Hot-Start Master Mix (ThermoFisher, Waltham, MA, USA), 10 µL of Sigma water (Sigma-Aldrich, Burlington, MA, USA), 1 µL of primers and 1 µL of gDNA. Thermocycler conditions were 94°C for 2 min, 35 cycles at 94°C for 15 s, 60°C for 15 s, 68°C for 1 min, and then 72°C for 10 min. Amplicon normalization and clean-up was done using the Invitrogen SequalPrep Normalization Plate kit (ThermoFisher) with 5 µL of amplicons from each sample pooled. The composited library was then quantified using the Invitrogen Qubit dsDNA HS Assay kit (ThermoFisher) and ABsolute qPCR Mix, SYBR Green, no ROX (ThermoFisher). The library was run on the Illumina MiSeq using a v2-300 cycle kit (Illumina, San Diego, CA, USA) at the Center for Microbial Exploration at the University of Colorado Boulder.
Hsp65 gene sequencing
As 16S rRNA gene sequence data are not generally useful for identifying particular mycobacterial taxa (29), we also sequenced the 65-kD heat shock protein (hsp65) gene using mycobacterial-specific primers Tb11-Tb12 (30). For a detailed description of library preparation and sequencing for soil and household plumbing biofilm samples (see Walsh et al. [15] and Gebert et al. [7]). For the surface water DNA extracts, the same protocol described in Gebert et al. (7) and Walsh et al. (15) was followed except we used Platinum II Hot-Start Master Mix (ThermoFisher) for PCRs and amplicons were cleaned by adding 8.85 µL DNA-free water and 0.251 µL of ExoSAP (New England Biolabs, Ipswich, MA) to a 20 µL aliquot of each amplicon followed by incubation at 37°C for 30 min followed by 95°C for 5 min.
A second PCR was done to attach 12 bp universal reverse barcodes for sample multiplexing. Each reaction consisted of 12.5 µL of Platinum Hot-Start Master Mix (ThermoFisher), 8.5 µL of Sigma water (Sigma-Aldrich), 2 µL of combined Illumina universal barcoded primers stock at 10 µM each (Illumina), and 2 µL of cleaned amplicon from the first reaction, with thermocycler conditions as follows: 95°C for 3 min, then 8 cycles of 95°C for 30 s, 55°C for 30 s, 72°C for 30 s, then 72°C for 2 min and held at 4°C. We followed the same protocol as above for normalization and quantification of the library. Sequencing was done using a v3-600 cycle kit (Illumina) on the Illumina MiSeq at the Center for Microbial Exploration at the University of Colorado Boulder.
16S rRNA gene sequence analyses
Raw reads from the surface water samples were demultiplexed using idemp (https://github.com/yhwu/idemp), and primers were trimmed from reads using cutadapt (version 1.8.1) (31). Data were processed using the DADA2 pipeline (version 1.22.0) (32). Reads were filtered based on the quality metrics (trunclen = (150, 140), maxEE = (2, 2), truncQ = 2, and maxN = 0). After the removal of chimeras, taxonomy was assigned against the SILVA nr v.138.1 database (33). This resulted in 376 samples being used for downstream analysis. The 16S rRNA marker gene data sets from the two previously published studies (household plumbing biofilms and soil) were generated using an identical procedure (see Gebert et al. [7] and Walsh et al. [15]). After processing and quality filtering, a total of 638 household plumbing biofilm samples and 143 soil samples were included in this study. Combined with the 373 surface water samples that met our threshold for inclusion (minimum 2,000 archaeal and bacterial 16S rRNA gene reads per sample), the combined 16S rRNA gene data set used for downstream analyses included a total of 1,154 samples.
Hsp65 sequence analyses
The hsp65 marker gene sequence data from each of the three datasets were combined and processed together using a single pipeline to maintain consistency and permit direct comparison of results across the three unique environments. First, each hsp65 data set was demultiplexed independently using idemp (https://github.com/yhwu/idemp). Primers were trimmed from reads using cutadapt (version 1.8.1) (31). Data were then processed using the DADA2 pipeline (version 1.22.0) (32). Each data set was filtered independently based on the quality metrics for each respective MiSeq run. Error rates were then determined, and sequence variants were inferred (pool = TRUE), resulting in three amplicon sequence variant (ASV) tables, one for each environment. Sequence tables from each of the three independently processed data sets (household plumbing biofilms, global soils, and surface waters) were then merged into a single table spanning the three distinct environments.
Since the primer pair used to PCR amplify a portion of the hsp65 gene (Tb11-Tb12) is not specific to members of the genus Mycobacterium, it was necessary to remove all non-mycobacterial reads. The combined ASV representative sequence file (repset) generated above was then filtered using the updated NTM database as the reference database using blastn (command: makeblastdb -in < NTM DB >) (see below for details) at a minimum match percent identity of 90% or greater. The combined ASV repset file was then filtered using seqtk (https://github.com/lh3/seqtk) to retain only ASVs that met the 90% or greater match threshold to the reference database.
Samples with fewer than 100 mycobacterial hsp65 reads were removed from the data set, resulting in a total of 822 samples total across the three environments (household plumbing biofilm N = 507, soil N = 140, and surface waters N = 175). This table was used in all downstream NTM analyses. Across these 822 samples, we detected a total of 7,854 mycobacterial ASVs.
Updated hsp65 marker gene database and phylogenetic tree
The NTM phylogenetic tree was built using an updated version of the Dai et al. 65-kD heat shock protein marker gene database (34), and visualized using the ETE 3 Toolkit (35). Briefly, the database was aligned and trimmed using MAFFT (version v7.505 [2022/Apr/10]) (36) and trimal (v1.4.rev15 build [2013-12-17]) (37), respectively. The aligned database was then used to train a Hidden Markov Model using HMMR (HMMER 3.3.2 [Nov 2020]) (38). The HMM was then used to search (hmmsearch) a curated file of mycobacterial whole-genome sequences extracted from the Genome Taxonomy Database (39). To remove any sequence noise and mismatches from the output file, a phylogenetic tree was built using RAxML (version 8.2.12), and divergent sequences were removed from the database. The final database resulted in 189 mycobacterial heat shock protein sequences, adding an additional 32 NTM species or strains to the 2011 Dai et al. database. The updated reference database can be found at https://github.com/gebertm/hsp65_Tb11-Tb12_db.
ASV clustering
All 7,854 mycobacterial ASVs were clustered to combine ASVs that were closely related based on phylogenetic analyses. This was done as we observed a high diversity of individual ASVs and because most ASVs were only found in a relatively small number of samples across all three datasets (mean of 6.4 samples per ASV). To conduct the phylogenetic clustering, we built a phylogenetic tree, including both the representative sequence file and updated hsp65 marker gene database, to assign taxonomy based on proximity to a known reference database sequence, using Nocardia farcinica (DSM43665) as the outgroup. Briefly, sequences were aligned using MAFFT (36) (v7.505 [2022/Apr/10]), and aligned sequences were then trimmed using trimAl (v1.4.rev15 build [2013-12-17]). The maximum likelihood tree was built using RAxML (version 8.2.12, 2018) with the following command raxmlhpc -f a -m GTRGAMMA -p 12,345x 12345 -# 100 T 30 --print-identical-sequences.
To cluster ASVs, we used the TreeCluster package (40) (version 1.0.4) with the distance threshold t = 0.20, meaning 0.20 is the maximum pairwise distance between leaves in a given cluster. The optimal threshold was set based on the clustering of the Mycobacterium tuberculosis complex reference sequences into one unique cluster, without including additional NTM taxa. TreeCluster resulted in 404 unique clusters, encompassing the 7,854 ASVs. For a cluster to be present in a sample, it was required to have a total of >10 reads across the combined data set. Sixty-two ASVs did not fall within a defined cluster (62 out of 7,854, or 0.79%), designated as −1 in the final cluster table (Table S3), and were excluded from the downstream analysis. Taxonomic names were assigned at the cluster level, when possible (i.e., clusters that included named reference strains were assigned the name of that reference strain). Only 74 out of 404 clusters (18%) contained one or more reference sequences, meaning most clusters (82%) could not be confidently assigned to the species level based on their phylogenetic similarity to a named species.
The top 50 clusters found in each environment, which encompassed >99%, 66%, and 95% of the total reads in household plumbing biofilms, soil, and surface waters, respectively, were selected for downstream analyses. Since the NTM diversity in the soil samples was considerably higher compared with the other two environments, a smaller percentage of the total NTM reads from the soil samples were captured in the top 50 clusters.
Statistical analysis
To visualize the variation in community structure of all NTM species, we ran a principal component analysis (PCoA) on the cleaned cluster table using the Phyloseq package in R (41). Briefly, we converted the cluster table from read counts to a presence/absence matrix, requiring a minimum of 10 reads per cluster for it to be considered “present” in the data. The dissimilarity matrix was calculated using pairwise Jaccard distances.
To test whether any of the measured environmental variables collected at each USGS site were associated with the NTM community composition in the surface water samples, we ran a Mantel test on samples using the R vegan package (version 2.6-8) (42). Briefly, for the measured environmental variables, an additional dissimilarity matrix was calculated using vegdist in the R vegan package (version 2.6-8) (42) using pairwise Euclidean distance. A Mantel test was then run between the taxa dissimilarity matrix and the metadata dissimilarity matrix to determine if any of the measured environmental variables were associated with the NTM community composition across samples. Figures and maps were generated using the ggplot and cowplot packages (43, 44) and the usmap package (version 0.7.1) (https://github.com/pdil/usmap), respectively, in R.
RESULTS
Relative abundance of genus Mycobacterium across the three environments
We used the 16S rRNA gene sequence data to determine the relative abundance of the genus Mycobacterium across the 1,154 samples from the three environments (Fig. 2). Members of the genus Mycobacterium were nearly ubiquitous across the three environments, detected in 100% of the 143 soils, 76% of the 373 surface water samples, and 94% of the 638 household plumbing biofilms. However, the relative abundances of mycobacteria varied depending on the environment. In household plumbing biofilms, mycobacterial abundances ranged from no mycobacteria detected (40 samples) to mycobacteria representing >95% of the bacterial communities (4 samples). The median relative abundance of the genus Mycobacterium in the plumbing biofilms (3.7%) was appreciably higher than that observed in soil (median relative abundance = 0.37%) and surface waters (median relative abundance = 0.015%). Only 1.3% of surface water samples (5 out of 373) and 11% of soil samples (16 out of 143) had mycobacterial abundances over 1%, while 60% of household plumbing biofilm samples had mycobacterial abundances greater than 1% (384 out of 638) (Fig. 2).
Relative abundances of the genus Mycobacterium across three environments (household plumbing biofilm [n = 638], soil [n = 143], and surface water [n = 373]). Relative abundances represent the proportional abundances of taxa assigned to the genus Mycobacterium compared to all other bacteria and archaea detected in each sample from 16S rRNA gene sequencing effort. All data points for soil are >0. Median relative abundances of the genus in each environment are noted above each scatterplot.
Mycobacterial species-level survey across the three environments
While 16S rRNA marker gene sequencing allowed us to compare the relative abundance of the genus Mycobacterium across environment types, species-level information obtained through mycobacterial-specific marker gene sequencing of the hsp65 gene is crucial for understanding the diversity of mycobacterial taxa in each environment, especially those of clinical significance. To better summarize the broad-scale diversity of the NTM in these environments, we phylogenetically grouped mycobacterial ASVs into “clusters” based on phylogenetic relatedness. Most NTM clusters found in the household plumbing biofilms clustered with known reference sequences (>80% sequence similarity to an isolate found in the reference database), including many of known clinical significance (e.g., M. avium and Mycobacterium chelonae). In contrast, many NTM clusters identified in soil and surface waters could not be identified below the genus level of taxonomic resolution as they did not include any named species in our phylogenetic analyses. When we focused on the top 50 NTM clusters detected in each of the three environments (150 in total), we found that in household plumbing biofilms, 24 clusters were found to contain a reference sequence (48%), but only 24% of clusters in soil and 16% of clusters in surface waters shared similarity with a reference sequence. All three environments harbored a broad diversity of NTM, spanning the mycobacterial phylogenetic tree (Fig. 3); however, the amount of novel taxonomic diversity observed in natural environments was higher than what was observed in household plumbing biofilms.
Maximum likelihood tree, using the mycobacterial hsp65 gene, showing 189 reference strains, plus one representative ASV from each of the top 50 NTM clusters detected in each environment (135 unique ASVs total). Colors of the outside ring represent the environment in which the NTM cluster was detected (household plumbing biofilm = green, soil = brown, and surface waters = blue). The phylogenetic tree was rooted using the hsp65 sequence from Nocardia farcinica DSM43665.
NTM cluster occupancy
Although NTM were detected in nearly every sample, regardless of the environment in question, the composition of the mycobacterial communities varied across environment types (Fig. 4). The most abundant NTM taxa found in the household plumbing biofilm samples were M. gordonae, followed closely by M. mucogenicum/phocaicum, while the most abundant taxa found in soil were M. madagascariense and M. holsaticum. The most abundant NTMs in the surface water data set could not be classified to species as they were not closely related to any named NTM species in our reference database (Fig. 4C). The variation in NTM community composition across the household plumbing biofilm samples was far greater than the variation detected across either the soil or surface water samples (Fig. 4). Additionally, the composition of the NTM communities found in surface waters and soils was generally more similar to each other than to those found in the household plumbing biofilm samples (Fig. 4), highlighting that each of the three broad environment types harbors distinct NTM communities. Notably, of the measured surface water variables collected (Table S2), only water temperature (°C) was positively correlated, albeit weakly, with the NTM community composition across samples (r statistic = 0.10, P < 0.05).
Bar plots showing the percent of samples (percent occupancy, i.e., percent of samples from each environment in which a given taxon was detected). (A) Household plumbing, (B) global soils, and (C) surface waters. Bars shaded in red are NTM taxa inferred to have high clinical significance, while bars shaded in yellow have low, or limited, clinical significance. Bars outlined in gray have no reported clinical significance. Taxa indicated by “M. ssp.” represent lineages that do not include any named mycobacterial isolates. On the right of each bar plot are shown principal coordinate analysis (PCoA) colored by environment type (household plumbing biofilms in green, soils in brown, and surface waters in blue), showing variation in overall NTM community composition across the three environments. We note that the three plots are identical except for the highlighting of specific sample types.
Relative abundance and ubiquity of clinically relevant NTM in the environment
The number of samples that contained clinically relevant NTM varied between environment types, with most clinically relevant taxa detected in the household plumbing biofilm samples (M. avium complex, M. mucogenicum, and M. abscessus), while the top 50 taxa detected in soils only contained a handful of named strains, two of which have limited clinical significance (M. florentinum and M. riyadhense), were detected in 17% and 14% of samples, respectively. The surface water samples contained only two named clinically relevant NTM, M. chelonae complex and the M. scrofulaceum complex, which were detected in 2.9% and 2.3% of samples, respectively. Due to the rarity of clinically significant NTM in our surface water data set, we did not have the statistical power to run any sort of correlation analyses between the measured environmental variables and the detection of clinically significant taxa.
We chose to focus on six clinically relevant nontuberculous mycobacterial taxa that are frequently detected in clinical samples from human patients (M. mucogenicum/M. phocaicum, M. avium complex, M. chelonae complex, M. abscessus complex, and M. fortuitum complex [45]) and how their relative abundances and occurrence patterns varied across the three environments (Fig. 4). All six clinically relevant NTM clusters were detected in household plumbing biofilms. Four of the six taxa (M. mucogenicum/phocaicum, M. avium complex 1, M. chelonae complex, and M. abscessus complex) were the most ubiquitous in household plumbing biofilms (Fig. 5). For example, M. mucogenicum/phocaicum was found in ~39% of samples from household plumbing biofilms, with a mean relative abundance of 16% of mycobacterial hsp65 reads, compared to less than 1% in the natural environments. The M. avium complex 1 was not detected in either soil or surface water samples but was found in nearly 7% of the household plumbing biofilm samples, with a mean relative abundance of 1.7% (Fig. 5). The M. avium complex 2 and the M. fortuitum complex, although not detected in surface waters, were detected in a greater number of soil samples compared to household plumbing biofilms (10% vs 6.9% and 12.1% vs 2%, respectively). Although the M. avium complex 2 was detected across more soil samples, it had a higher mean relative abundance in household plumbing biofilms (1.4%), compared with soil (0.29%).
Scatterplots showing the relative abundance of six clinically relevant NTM groups across the three environments: (A) M. mucogenicum complex, (B) M. avium complex 1, (C) M. avium complex 2, (D) M. chelonae complex, (E) M. abscessus complex, and (F) M. fortuitum complex. Mean relative abundances and 95% confidence intervals are shown in red. Relative abundances were calculated by dividing the number of hsp65 reads assigned to each group in a sample by the total number of mycobacterial hsp65 reads from that sample.
DISCUSSION
NTM pulmonary and extrapulmonary disease is on the rise in the United States and around the world (4, 5, 46). Although many environments have been implicated in the direct transmission of disease, including hospital water supplies (47–49), the ecology of NTM across environments remains poorly understood, and studies showing how NTM taxa, including known pathogens, vary between environment types remain limited. Here we characterized the variation in the amounts and types of NTM, including those of clinical significance, across a broad range of environments frequently implicated as sources of NTM exposure, including both natural environments (soils and surface waters) and a built environment (household premise plumbing).
While we detected members of the genus Mycobacterium in nearly all samples from all three environments, the relative abundance and types of NTM differed appreciably across environments. The median relative abundance of the genus Mycobacterium in both natural environments was low compared to that of household plumbing, possibly due to higher overall abundances of other bacterial taxa in the natural environments, resulting in the genus Mycobacterium being a smaller proportion of the overall community. We also found that the three environments harbored diverse and distinct NTM communities (Fig. 3), but most of the NTM taxa identified, particularly in the natural environments, were of limited to no clinical significance. Of note, clinically significant NTM (including M. mucogenicum and M. abscessus) were more frequently detected and more abundant in household plumbing biofilms compared to the natural environments surveyed (Fig. 4). These results suggest that household plumbing biofilms, not soils or surface waters, likely pose a higher risk of NTM pathogen exposure and subsequent infection. Although clinically relevant NTM were more frequently detected in household plumbing samples, we note that detection of pathogens does not necessarily equate to a higher risk of infection from household water sources, as the association between the number of pathogenic NTM cells and disease risk remains uncertain (6). Likewise, differences in exposure routes, duration, and frequencies likely influence infection risk. However, our results do suggest that patients at risk of NTM infections are more likely to acquire infections from premise plumbing sources than from soils or surface waters, emphasizing the potential risks associated with exposures to certain environments.
Of note, the M. scrofulaceum complex, which encompasses the clinically relevant species M. scrofulaceum, M. interjectum, and M. malmoense, appears rarely in the household plumbing biofilm samples (occupancy = 0.39%) but appears more frequently in both natural environments (occupancy = 10% in soil and 2.3% in surface waters), reflecting the known susceptibility of this NTM group to the chlorination practices commonly used to treat household water supplies in the United States, practices that may be associated with a decrease in M. scrofulaceum cases in the United States (50).
Taken together, our study highlights that, while many environments harbor NTM, they do not all harbor the same amounts and types of NTM, including clinically relevant taxa which are particularly ubiquitous and abundant in premise plumbing biofilms compared to soils and surface waters. Overall, a better understanding of the ecology of NTM in source environments, including the distributions of clinically relevant NTM taxa, should help inform epidemiological investigations of NTM disease outbreaks and guide recommendations provided to disease-susceptible individuals to minimize NTM exposures. Future research should focus on the factors that contribute to NTM selection in the built environment—a likely important route of exposure for human disease. Additionally, studies investigating the temporal variability in NTM within sites, as well as the impact of water temperature on NTM community composition, with a focus on potential pulmonary and extrapulmonary pathogens, will be important for expanding our understanding of their accumulation in premise plumbing biofilms.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Brennan PJ, Nikaido H. 1995. The envelope of mycobacteria. Annu Rev Biochem 64:29–63. doi:10.1146/annurev.bi.64.070195.0003337574484 · doi ↗ · pubmed ↗
- 2Lake MA, Ambrose LR, Lipman MCI, Lowe DM. 2016. ’ “Why me, why now?” Using clinical immunology and epidemiology to explain who gets nontuberculous mycobacterial infection. BMC Med 14:54. doi:10.1186/s 12916-016-0606-627007918 PMC 4806462 · doi ↗ · pubmed ↗
- 3Winthrop KL, Marras TK, Adjemian J, Zhang H, Wang P, Zhang Q. 2020. Incidence and Prevalence of Nontuberculous Mycobacterial Lung Disease in a Large U.S. Managed Care Health Plan, 2008-2015. Ann Am Thorac Soc 17:178–185. doi:10.1513/Annals ATS.201804-236OC 31830805 PMC 6993793 · doi ↗ · pubmed ↗
- 4Dahl VN, Mølhave M, Fløe A, van Ingen J, Schön T, Lillebaek T, Andersen AB, Wejse C. 2022. Global trends of pulmonary infections with nontuberculous mycobacteria: a systematic review. Int J Infect Dis 125:120–131. doi:10.1016/j.ijid.2022.10.01336244600 · doi ↗ · pubmed ↗
- 5Bents SJ, Mercaldo RA, Powell C, Henkle E, Marras TK, Prevots DR. 2024. Nontuberculous mycobacterial pulmonary disease (NTM PD) incidence trends in the United States, 2010-2019. BMC Infect Dis 24:1094. doi:10.1186/s 12879-024-09965-y 39358723 PMC 11445848 · doi ↗ · pubmed ↗
- 6Falkinham JO 3rd. 2016. Current epidemiologic trends of the nontuberculous mycobacteria (NTM). Curr Envir Health Rpt 3:161–167. doi:10.1007/s 40572-016-0086-z · doi ↗
- 7Gebert MJ, Delgado-Baquerizo M, Oliverio AM, Webster TM, Nichols LM, Honda JR, Chan ED, Adjemian J, Dunn RR, Fierer N. 2018. Ecological analyses of mycobacteria in showerhead biofilms and their relevance to human health. M Bio 9. doi:10.1128/m Bio.01614-18 · doi ↗
- 8Gross JE, Caceres S, Poch K, Hasan NA, Davidson RM, Epperson LE, Lipner E, Vang C, Honda JR, Strand M, Strong M, Saiman L, Prevots DR, Olivier KN, Nick JA. 2021 Healthcare-associated links in transmission of nontuberculous mycobacteria among people with cystic fibrosis (HALT NTM) study: rationale and study design. P Lo S One 16:e 0261628. doi:10.1371/journal.pone.026162834929010 PMC 8687591 · doi ↗ · pubmed ↗
