High-resolution profiling of bacterial and fungal communities using pangenome-informed taxon-specific long-read amplicons
Luzia Stalder, Monika Maurhofer, Daniel Croll

TL;DR
This paper introduces a new method for high-resolution profiling of bacterial and fungal communities in the wheat microbiome using pangenome-informed long-read amplicons.
Contribution
The novel pipeline enables strain-level resolution and outperforms traditional methods like 16S and ITS amplicons.
Findings
The pipeline provides an order of magnitude higher phylogenetic resolution compared to existing ribosomal amplicons.
The method accurately captures species and strain diversity in the wheat-associated Pseudomonas microbiome and Zymoseptoria tritici.
Pangenome-informed amplicons allow tracking of microbial dynamics in the wheat phyllosphere across time and space.
Abstract
High-throughput sequencing technologies have greatly advanced our understanding of microbiomes, but resolving microbial communities at species and strain levels remains challenging. We developed and validated a pipeline for designing, multiplexing, and sequencing highly polymorphic taxon-specific long-read amplicons. We focused on the wheat microbiome as a proof-of-principle and demonstrate strain-level resolution for the wheat-associated Pseudomonas microbiome and the ubiquitous fungal pathogen Zymoseptoria tritici. We achieved an order of magnitude higher phylogenetic resolution compared to existing ribosomal amplicons. The designed amplicons accurately capture species and strain diversity outperforming full-length 16S and ITS amplicons. Furthermore, we tracked microbial communities in the wheat phyllosphere across time and space to establish fine-grained species and strain-specific…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8- —http://dx.doi.org/10.13039/501100001711Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Microbial Community Ecology and Physiology · Plant-Microbe Interactions and Immunity
Introduction
High-throughput sequencing technologies (HTS) have revolutionized our understanding of microbiomes revealing insights into structures, functions and dynamics. From human health to agricultural systems, microbial communities are shaped by host identity, environmental conditions, and biotic interactions, with implications for ecosystem functioning and host outcomes [1–6]. Targeted amplicon sequencing is the most commonly employed HTS method for microbiome analyses interrogating conserved loci such as 16S ribosomal DNA for prokaryotes and internal transcribed spacers (ITS) for eukaryotes. This approach is particularly valuable in scenarios with low microbial biomass or high host DNA contamination—a common challenge in plant, environmental, and clinical microbiome research [7]. While metagenomic sequencing provides richer functional insights, its utility is often limited by the need for high microbial abundance [7].
A persistent limitation of conventional amplicon sequencing is the restricted taxonomic resolution below the genus level, which obscures ecologically and functionally critical diversity within microbial groups [7–9]. Recent advances in long-read sequencing (e.g., PacBio, Oxford Nanopore) now enable high-accuracy sequencing of full-length ribosomal markers (16S, ITS) or even multi-gene operons (e.g., 16S-ITS-23S), significantly improving resolution [9–14]. PacBio circular consensus sequencing (CCS) generates highly accurate (> 99.9%) reads comparable to Illumina and Sanger sequencing but reaching up to 20 kb in length [15].
Yet, even these advances may fall short of capturing strain-level diversity, which is often pivotal for understanding microbial interactions, niche adaptation, and functional impacts. For example, within the ubiquitous genus Pseudomonas, closely related species can exhibit starkly divergent ecological roles: while P. syringae strains are often plant pathogens, P. fluorescens, P. protegens, P. chlororaphis, and P. putida species typically promote plant growth [16–22]. Such functional divergence is driven by strain-specific traits, including metabolite production, which underscores the need for tools that resolve fine-grained microbial diversity [21, 23–25].
To address this gap, we present an approach for designing, multiplexing, and sequencing highly polymorphic taxon-specific amplicons using PacBio CCS. While universal ribosomal markers remain essential for community profiling, our approach complements these tools by enabling deep resolution of select ecologically relevant taxa. Our pipeline allows to track complex bacterial and fungal communities at the species and strain level from environmental samples. While comprehensive profiling across all phyla is desirable, current technological limitations make it challenging to multiplex a wide range of taxon-specific amplicons without compromising recoverability and balanced amplification. Therefore, focusing on specific taxa of interest allows for in-depth analysis within a manageable experimental framework. As a proof-of-principle, we focus on the microbiome of wheat—a system of broad agricultural relevance—where high-resolution profiling is critical for understanding microbial interactions affecting crop health [26]. The microbiome of wheat leaves is often dominated by the pathogenic fungus Zymoseptoria tritici (causing septoria tritici blotch) [27–29], and Pseudomonas spp., which exhibit strain-specific antagonistic interactions [26, 30]. For instance, Z. tritici suppresses the host immune system to facilitate colonization by strains of the P. syringae group [31], while specific P. fluorescens strains inhibit the growth of Z. tritici [32].
In this study, we introduce and validate a pipeline for a novel suite of highly multiplexed amplicons that accomplish an order of magnitude higher phylogenetic resolution compared to existing ribosomal amplicons. We target the wheat-associated Pseudomonas microbiome and the major fungal pathogen Z. tritici. We achieve species and strain-level resolution for both groups in environmental samples, and we demonstrate substantial gains in phylogenetic resolution by tracking strains across the wheat canopy and over time. Furthermore, we highlight the pipeline’s adaptability to other ecologically or clinically significant taxa (e.g., Rhizobia, Streptomyces, Aspergillus).
Methods
Wheat leaves sampling
We analyzed eight elite European winter wheat (Triticum aestivum) varieties sampled at five different timepoints over the growing season (i.e., cultivars Aubusson, Arobase, Lorenzo, CH Nara, Zinal, Simano, Forel, and Titlis on the 23.05.2019, 06.06.2019, 27.06.2019, 15.07.2019, and 22.07.2019, respectively). The wheat developed from the growth stage Feekes 7.0 until 11.4 [33]. Two biological replicates of the wheat panel were grown in two complete block designs separated by approximately 100 m at the field phenotyping platform site of the Eschikon Field Station of the ETH Zurich, Switzerland (coordinates 47.449°N, 8.682°E). The cultivars were grown in plots of 1.2-by-1.7-m, with the genotypes arranged randomly within each block. No fungicides were applied. Leaves were infected naturally by a local population of Z. tritici and the epidemic was allowed to develop naturally. Infection by Z. tritici, assessed based on host damage, was widespread across the field experiment. For each cultivar, block, and timepoint, two plants were collected. From each plant, three leaves were collected: the bottom leaf touching the ground (first leaf), the lowest leaf not touching the ground (second leaf), and then the flag leaf (fourth leaf). Each leaf was immediately stored in a plastic foil to avoid contamination, stored at 4 °C overnight before processing. In total, 480 leaves were collected (8 cultivars × 2 blocks × 5 timepoints × 2 plants × 3 leaves).
Mock community creation
We established three sets of mock communities using ten different Pseudomonas strains and two Z. tritici isolates (Supplementary Table 1). The first set was composed of a ten-fold serial dilution series of the DNA from two Pseudomonas isolates (P. syringae Leaf129 and P. thivervalensis PITR2) and two Z. tritici isolates (ST99CH 1E4 and ST01IR 48b), up to a dilution of 10^−5^. The DNA input per sample was 15 ng, with 7.5 ng DNA each from Pseudomonas and Z. tritici for the undiluted sample. For the 10^−1^ dilution, we used 0.75 ng DNA for the Pseudomonas and Z. tritici isolate that was to be diluted, together with 6.75 ng of the background Pseudomonas and Z. tritici isolate respectively. The second set consisted of a ten-fold serial dilution series up to 10^−5^, where P. thivervalensis PITR2 and ST99CH 1E4 were diluted in a leaf DNA sample. We used two different leaf DNA extracts, Leaf 1 (Timepoint 2, Cultivar Zinal, Plot 2, Plant 2, Canopy height: Fourth leaf) and Leaf 2 (Timepoint 3, Cultivar Titlis, Plot 2, Plant 2, Canopy height: Fourth leaf). The DNA input per sample was 15 ng, starting with 0.75 ng of Pseudomonas and Z. tritici isolates, respectively, and 13.5 ng of leaf DNA for the 10^−1^ dilution. For the third set, we combined increasing numbers of Pseudomonas isolate DNA extracts (from two to ten isolates) in equimolar concentrations. The total DNA input per mixture was 15 ng, adding 7.5 ng per isolate for the two-strain mixture and 1.5 ng for the 10-strain mixture. To estimate the number of cells per input DNA, we based our calculations on the average genome size of Pseudomonas (6.5 Mbp) and Z. tritici (39.7 Mbp), respectively. We used a conversion factor of 660 daltons per base pair for these calculations.
Pangenome construction
For pangenome construction, we used Panseq with the following settings: fragmentationSize = 5000, minimumNovelRegionSize = 500, novelRegionFinderMode = vno_duplicates, percentIdentityCutoff = 60, runMode = pan, storeAlleles = 1, allelesToKeep = 2, frameshift = 1, overwrite = 1, maxNumberResultsInMemory = 500, blastWordSize = 11, nucB = 200, nucC = 65, nucD = 0.12, nucG = 90, nucL = 20, cdhit = 0, sha1 = 0 [34]. To create the Pseudomonas pangenome, we selected 19 high-quality genomes representing all subgroups of the genus [35, 36] (Supplementary Table 2). For the Z. tritici pangenome, we used 19 global reference isolates used for the pangenome analysis of Badet et al. 2020 (Supplementary Table 2). For the A. fumigatus pangenome creation, we used all publicly available genomes of the pangenome study of Barber et al. 2021 (n = 253, Supplementary Table 2). For the Rhizobia pangenome, we used the genomes of the pangenome analysis of Yang et al. 2020 (n = 84, Supplementary Table 2). For the Streptomyces pangenome, we based the genome selection on Kieper et al. 2023. Their study clustered all available Streptomyces genomes into 186 distinct species using average nucleotide identity (ANI) criteria. We filtered the genomes in two steps to ensure they met our quality standards. First, we represented each species cluster with one genome per species to avoid redundancy. Second, we retained only complete genomes or chromosome-level assemblies. After applying these filters, 97 genomes met the criteria and were used for pangenome construction (Supplementary Table 2). Bacterial plasmid sequences were removed for all pangenome analysis. Furthermore, to be able to better compare the Rhizobium, the Streptomyces and the A. fumigatus pangenome characteristics to the Pseudomonas and Z. tritici pangenomes, we subset the genome sets to 19 genomes each for a second pangenome construction based on the same number of genomes (Supplementary Table 2). Regions that were present in > 50% of the pangenome isolates were selected as “core” for further analysis of the amplicon primer design. We used this relaxed criterion to be able to scan divergent regions for amplicons potentially revealing the highest resolution among strains. Supplementary Table 3 details the Pseudomonas pangenome core regions, and Supplementary Table 4 details the Z. tritici pangenome core regions. Orthologues genes of all pangenomes were identified using Orthofinder v2.5.5 with default parameters [37].
Amplicon primer design
For each pangenome core fragment of the Pseudomonas and the Z. tritici pangenome, we created a multiple sequence alignment using muscle v.3 with default parameters [38]. From each core multiple sequence alignment, we created a consensus sequence using the EMBOSS v.6 cons function, setting the identity to ten and plurality to 0.8 [39]. We then ran Primer3 v. 2.5 on each core consensus sequence [40]. We used the following Primer3 settings: PRIMER_PICK_LEFT_PRIMER = 1, PRIMER_PICK_INTERNAL_OLIGO = 0, PRIMER_PICK_RIGHT_PRIMER = 1, PRIMER_OPT_SIZE = 20, PRIMER_MIN_SIZE = 18, PRIMER_MAX_SIZE = 22, PRIMER_PRODUCT_SIZE_RANGE = 1500–3100, PRIMER_NUM_RETURN = 8, PRIMER_EXPLAIN_FLAG = 1. For each primer pair, we cut out the respective amplicon from the multiple sequence alignment using the EMBOSS v. 6 extractalign function [39]. To evaluate amplicon diversity, we calculated the nucleotide diversity of each amplicon multiple sequence alignment within the pangenome. For this, we first converted the multiple sequence alignment to vcf format using snp-sites [41], and then vcftools v0.1.16 with the options –sites-pi –haploid to calculate the nucleotide diversity [42]. For both Pseudomonas and Z. tritici, we selected the eight amplicons with the greatest summed nucleotide diversity for further evaluation (Supplementary Tables 5 and 6). Each Pseudomonas candidate primer pair was then blasted against all available Pseudomonas genomes from the ncbi nucleotide collection using BLASTn to ensure matches in all genomes [43]. All hits were aligned using MAFFT v7.427 with the option –auto [44, 45]. For Z. tritici, we analyzed each candidate amplicon against the worldwide collection of 1109 Z. tritici isolates [46]. Here, amplicon sequences were extracted from variant call files (i.e., vcf) using bedtools filter [47]. For both Pseudomonas and Z. tritici amplicon evaluation, we assessed allele frequencies at every position of the multiple sequence alignment using base R functions [48]. We manually adjusted primer sequences to account for the specific combinations of alleles necessary to amplify all known Pseudomonas, or* Z*. tritici sequences, respectively. Primer sequences and nucleotide diversity values of the tested amplicons are documented in Supplementary Tables 5 and 6.
All eight Pseudomonas and eight Z. tritici primer candidates were tested on each of five reference cultures of Pseudomonas and Z. tritici, respectively, as well as on naturally infected wheat leaf samples. The optimization process involved the following key steps. First, we established an annealing temperature gradient ranging from 55 to 65 °C to determine the optimal temperature for amplification. We also evaluated two touch-down PCR protocols: the first protocol varied from 66 to 56 °C, while the second ranged from 63 to 53 °C. To estimate the sensitivity of the assay, we tested various DNA input amounts ranging from 0.001 to 5 ng. To ensure the specificity of the amplified products, we performed Sanger sequencing on two strains per amplicon. Amplification products were resolved by agarose gel electrophoresis. Based on these tests, we selected the two primer pairs for each organism that demonstrated the most consistent (i.e., amplified across all conditions) and specific (i.e., no off-target bands) amplification success across the tested range of input material and isolates. These selected primers were then utilized in subsequent experimental steps.
Sample homogenization, DNA extraction, amplification, pooling, and cleanup
Leaves were lyophilized for 48 h and weighed. Then, the complete leaves were homogenized using 0.5 mm and 0.2 mm zirconium beads in the Bead Ruptor bead mill homogenizer (OMNI) using the following settings: speed 5.00, number of cycles 2, time of cycle 1:00, time distance between cycles 1:00. DNA extraction was performed with automated magnetic-particle processing using the KingFisher Flex Purification Systems (Thermo Scientific). To enhance the DNA extraction of fungal and bacterial DNA, lyticase (from Arthobacter Lueteus, Sigma-Aldrich) and lysozyme (Sigma-Aldrich) was added to the first lysis step with PVP buffer (LGC Genomics). Specifically, for 10 mg dry leaf mass 3.9 µl lyticase (200,000 U/mg, diluted to 6.5 mg/ml), 3.9 µl lysozyme (22,500 U/mg, diluted to 10 mg/ml), and 98 µl PVP lysis buffer was added, and samples were incubated at 55 °C for 30 min. Then, 3.9 µl proteinase K (Promega, 30 U/mg, diluted to 10 mg/ml) for 10 mg dry mass was added and incubated at 55 °C for 30 min. From each sample, 150 µl of clear lysate was transferred to an empty binding plate (KingFisher Flex, Thermo Scientific). For each sample, 360 µl PN binding buffer (LGC Genomics), 30 µl well suspended Sbeadex beads (LGC Genomics) were added. Using the KingFisher Flex, the first washing step was performed using 400 µl PN1 buffer (LGC Genomics) per sample, then a second wash using 390 µl buffer PN1 (LGC Genomics) with 10 µl RNase A (Merck, lyophilized, diluted to 10 mg/µl in water) and a third wash using 400 µl PN2 buffer (LGC Genomics). Each sample was eluted in 100 µl nuclease-free water. The DNA concentration was measured using the Spark Microplate reader (Tecan). Then, DNA concentrations were diluted to 5 ng/µl using the liquid handling station (Pipetting robot with flowbox Nr. 709,402, Brand Inc.). PCR reactions were pipetted using the Mosquito HV liquid handling robot (SPT Labtech). The first amplicon PCR reaction was performed in a 15 µl reaction volume. Specifically, 7.5 µl KAPA HiFi HotStart ReadyMix (2x), 1.5 µl forward primer (3 µM), 1.5 µl reverse primer (3 µM), 3 µl DNA (5 ng/µl), and 1.5 µl HPLC water were combined. Primer sequences and cycling protocols are documented in Supplementary Tables 5 and 7. All primers were synthesized by IDT (Integrated DNA Technologies, Coralville, IA). Pseudomonas and Z. tritici-specific amplicon PCR products were diluted 1:5, PCR products from 16S and ITS 1:10. The second barcoding PCR reaction was performed in a 25 µl reaction volume. Specifically, 12.5 µl KAPA HiFi HotStart ReadyMix (2 ×) (Roche), 2.5 µl M13 forward barcode (3 µM), 1.5 µl M13 reverse barcode (3 µM), 2 µl diluted PCR product, and 5.5 µl HPLC water were combined. Barcode sequences are available in Supplementary Table 8. Samples were pooled by amplicon taking 1.5 µl from each barcoded product. Each amplicon pool was cleaned using AMPure XP beads (Beckman Coulter Life Sciences) using a bead ratio of 0.8 ×.
Library preparation and sequencing
Library preparation and sequencing was carried out at the Functional Genomics Centre Zurich (FGCZ). Two SMRTbell libraries were prepared for each amplicon length using the SMRTbell prep kit 3.0. One for the long 3-kb amplicons, one for the 1.5 kb 16S and ITS amplicons. Size selection was performed using BluePippin (Sage Science) with a 0.75% dye-free cassette for each library. In the sequencing run, the 3-kb and 1.5 kb amplicon pools were combined in a 3:2 ratio to compensate for the inherent tendency of this sequencing technology to produce fewer reads for longer amplicons. This ratio resulted in slightly more reads for the longer amplicons, as documented in Supplementary Table 9. PacBio sequencing was performed on a Sequel II machine with the SMRT 8 M cell. Two sequencing runs were conducted: the first one using SMRT Link version 10.1 for all leaf samples, and the second using SMRT Link version 11.1 for all mock samples. The second run involved less multiplexing, resulting in a higher number of reads for the mock samples compared to the leaf samples, as indicated in Supplementary Table 9.
PacBio raw read processing
CCS were extracted from raw reads using the ccs software from the bioconda package pbccs provided by the manufacturer (Pacific Biosciences). Only reads with an accuracy exceeding 99% (Q20, minimum RQ) were included in the analysis. For the first run, pbccs v. 6.0.0 was used and for the second run v. 6.3.0. We split the CCS reads by barcodes using the software lima 2.0.0 (Pacific Biosciences) with the following parameters: –log-level INFO –per-read –min-passes 0 –split-bam-named –ccs –different -A 1 -B 3 -D 2 -I 2 -X 0. We assigned the reads to the respective amplicons using BLASTn assignments to reference amplicon from the P. fluorescens CHA0 and Z. tritici isolate 1E4 [43]. All six reference amplicons were blasted against each read and reads were then assigned to the reference hit with the lowest e value, the highest length and the highest identity. We removed primer sequences using cutadapt v. 3.4 with the following parameters: -a FORWARD_PRIMER_SEQ…REVERSE_PRIMER_SEQ –discard-untrimmed –revcomp [49]. Primers were treated as linked, i.e., reads without primers at both ends were discarded. The R package dada2 v. 1.28.0 was used to infer ASVs for each amplicon separately [50]. In the following, dada2 steps for each amplicon are described. Reads were filtered and trimmed using the function filterAndTrim with the parameters minLen = minLength, maxLen = maxLength, rm.phix = FALSE, maxEE = 2, qualityType = “FastqQuality,” multithread = TRUE. Length ranges for each amplicon are described in Supplementary Table 10. Reads were dereplicated using the function derepFastq with the parameters verbose = TRUE, qualityType = “FastqQuality.” Error models were estimated with the function learnErrors with the parameters errorEstimationFunction = dada2:::PacBioErrfun, BAND_SIZE = 32, multithread = TRUE. Reads were denoised using the function dada with the parameters BAND_SIZE = 32, multithread = TRUE, pool = FALSE. Importantly, we did not use the option pool = TRUE, as this introduced spurious ASVs in the reference cultures of the mock community. ASV sequence tables were generated using the function makeSequenceTable. To remove chimeras from the sequence table, the function removeBimeraDenovo with parameters method = “consensus,” minFoldParentOverAbundance = 3.5, multithread = TRUE, verbose = TRUE was used. Supplementary Fig. 1 shows the read tracking through the dada2 pipeline for each amplicon. Taxon-specific ASVs were searched using blast against the NCBI nt database to verify whether all reads show the highest identity with Pseudomonas and Z. tritici sequences, respectively [43].
Taxonomic classification
We assigned 16S reads using the dada2-formatted Silva database v. 138 [51] and ITS reads to the UNITE database v. 8.3 [52]. To perform taxonomy assignment, we used the function assignTaxonomy from the dada2 package. As the UNITE database comprises mostly ITS1-ITS2 reference sequences and only few full-length ITS-LSU sequences, we truncated all ITS reads to the ITS1-ITS2 fragment for assignment. We blasted the reads against the ITS1-ITS2 fragment of Z. tritici strain S-46 (KT336200.1) and used the hit coordinates to truncate reads with the seqkit software function subseq [53]. We assigned Pseudomonas reads of the rpoD, transporter and 16S amplicons to Pseudomonas species using BLASTn against all 1071 full-length Pseudomonas genomes available from the Pseudomonas db v. 21.1 (2022–11-20) (Supplementary Table 11) [43, 54]. The best assignment was chosen according to BLASTn bitscores. We assigned Z. tritici reads of the Z. tritici amplicons on chromosomes 9 and 13, and of the ITS amplicon to a database of previously sequenced Z. tritici strains. For this, we used draft assemblies of previously collected 177 genomes from the Eschikon Field Station of the ETH Zurich, Switzerland [55], as well as genomes from the reference pangenome [56]. The best assignment was chosen according to bitscores.
Alignment, phylogenetic tree, and network construction
We performed multiple sequence alignment using PASTA v.1.9.0 with the following MAFFT arguments: –leavegappyregion –6merpair –maxiterate 0 –adjustdirection –reorder and FastTree model -gtr -gamma -fastest [57]. We built phylogenetic trees of leaf samples using FastTree v.2.0.0 [58], and of the Pseudomonas mock samples using raxml-ng v. 1.2.0 with the options –all –model GTR + G –opt-model on –threads 4 –seed 2 –outgroup “Main ASV of P. syringae Leaf129” [59]. To create unrooted phylogenetic networks for the Z. tritici amplicon 1 on chromosome 13, amplicon 2 on chromosome 9, and the ITS, the software SplitsTree v. 4.19.1 was used using uncorrected p distances [60]. For in-silico analysis of Pseudomonas amplicons, we constructed ANI dendrograms based on all Pseudomonas whole genomes from the Pseudomonas db v. 21.1 (2022–11-20) (Supplementary Table 11) [54]. ANI dendrograms were calculated using the software ANIclustermap [61] and the R package pvclust v. 2.2.0 [62]. We extracted in-silico amplicons from whole Pseudomonas genomes using BLASTn [43] and aligned them using MAFFT and then calculated phylogenetic trees using iqtree [44, 45, 63]. We calculated generalized Jaccard–Robinson–Foulds (JRF) and Mutual clustering information (MCI) using the TreeDist package v. 2.9.1 in R to compare the ANI dendrogram to the amplicons [64].
Statistics
Statistical analyses were conducted using R and its built-in stats package v. 4.2.2 [65]. Regression analyses were performed using the lm function, while correlations were calculated using the cor.test function. For non-parametric comparisons, Kruskal–Wallis tests were carried out using the kruskal.test function. Shannon diversity was calculated using the function diversity from the R package vegan v. 2.6.4 [66]. Permutations were calculated for 1000 iterations using the base R function sample with replace = TRUE [48]. Subtrees were identified using the function subtrees from the R package ape v. 5.7.1 [67]. To calculate the length-normalized ribosomal nucleotide diversity from the pangenomes, the 16S and ITS sequences were identified with BLASTn [43]. The hits were aligned using muscle v.3 with default parameters [38]. Nucleotide diversity was calculated from the multiple sequence alignments using vcftools v0.1.16 with the options –sites-pi –haploid [42]. For improved comparability, the nucleotide diversity was normalized by dividing it by the alignment length and the number of genomes in the pangenome.
Visualization
To visualize ASV abundances, the R package phyloseq v. 1.42.0 was used [68]. Counts were normalized by sample using the phyloseq function transform_sample_counts(phyloseq_object, function (x) x/sum(x)). The heatmap was created using the R package pheatmap v. 1.0.12 [69]. Table numbers were visualized with the method = “circle” of the R package corrplot v. 0.92 [70]. Phylogenetic trees were visualized using the R package ggtree v. 3.6.2 [71, 72]. All other plots were created using the R package ggplot2 v. 3.4.2 [73]. Organism and machine icons were created with BioRender.com.
Results
Pangenome-informed design of taxon-specific amplicons
We developed new 3-kb long amplicons to enhance the resolution of bacterial and fungal community profiling while maintaining amplification within the targeted group of organisms (Fig. 1A). To identify highly polymorphic Pseudomonas amplicons, we constructed a comprehensive pangenome of 19 high-quality genomes representing all subgroups of the genus. Genome selection was based on the 4-gene multilocus sequence analysis (MLSA) framework, where we included two genomes per major phylogenetic group and one genome per subgroup for highly diverse lineages (Supplementary Table 2) [36, 74]. While whole-genome data remain unevenly distributed across groups, we prioritized representative diversity over redundant sampling, as including closely related genomes did not improve diversity assessment but complicated core-region filtering. From this analysis, we identified 1059 core regions conserved in the Pseudomonas pangenome. These core regions served as candidate regions for primer development. For each core fragment, we designed all possible amplicons ranging from 2.7 to 3.2 kb resulting in 224 amplicon candidates. Nucleotide diversity in aligned core regions was used as a metric to prioritize amplicon candidates.Fig. 1. Pangenome-informed design of taxon-specific amplicons. A Schematic overview of amplicon design pipeline. Conserved blocks in the Pseudomonas and Z. tritici pangenomes were extracted. Consensus sequences of the core blocks were used to design amplicons. Amplicon candidates were ranked according to nucleotide diversity, and primer sequences were adjusted. Polymorphic amplicons were tested on reference strains and environmental samples. B Overview of multiplexed amplicon evaluation on environmental samples. DNA from 30 mock samples and 480 wheat leave samples were automatically extracted. Two Pseudomonas-specific amplicons, two Z. tritici-specific amplicons, as well as the full-length 16S and ITS were amplified in each sample and prepared using robotic liquid handling for accuracy and throughput. Reactions were performed in triplicates and barcoded individually using asymmetric M13 barcodes resulting in a total of 18,360 PCR reactions. All reactions were pooled and sequenced in two PacBio Sequel II 8 M cells
We selected the eight most polymorphic amplicon candidates for PCR evaluation using both Pseudomonas reference strains and naturally colonized wheat leaf samples. Candidate primer sequences were adjusted based on alignment against all available sequences of Pseudomonas strains (n = 1071) to maximize recovery of the Pseudomonas diversity. We allowed primer candidates to include up to five degenerate bases. To reduce the number of sequence variants to be considered in primer candidates, we reduced degenerate positions to match known combinations in Pseudomonas genomes. Finally, we selected two amplicons that consistently showed the strongest PCR amplification for further analyses. The first amplicon spanned a locus including the rpoD gene, previously utilized for taxonomic classification [35, 36]. The second amplicon encompassed genes encoding an ABC transporter (APE98195.1) and an ABC transporter permease (APE98194.1), respectively. We evaluated the newly designed amplicons against the full genome phylogeny of Pseudomonas genomes based on ANI (Fig. 2 and Supplementary Fig. 2). Specifically, we found that the taxon-specific amplicons show lower Jaccard-Robinson-Foulds (JRF) distances and mutual clustering information (MCI) to the ANI phylogeny compared to the full-length 16S, as well as the rrn operon and the rpoD amplicon previously used for Pseudomonas multilocus sequence analysis [13, 14, 36, 74]. We pursued a parallel approach to identify primers suitable for amplifying the intra-specific diversity of the fungal pathogen Z. tritici. We based our analysis on a recently established reference-quality global pangenome for the species [56]. The two best-performing amplicons were located on chromosomes 9 and 13, respectively.Fig. 2. Comparison of Pseudomonas ANI phylogenies and taxon-specific amplicons. A Comparison between the Pseudomonas average nucleotide identity (ANI) dendrogram and the phylogenetic tree constructed using full-length 16S rRNA sequences. Tree nodes are colored according to the Pseudomonas group and subgroup, respectively. B Comparison between the Pseudomonas ANI dendrogram and the phylogenetic trees generated from PacBio sequences of the rpoD and Transporter amplicons. C Generalized Jaccard–Robinson–Foulds (JRF) distances and mutual clustering Information (MCI) between the Pseudomonas ANI dendrogram and following amplicons: full-length 16S rRNA (~ 1500 bp), rrn operon (~ 4200 bp), short rpoD amplicon (~ 720 bp), PacBio rpoD (~ 2900 bp), and PacBio transporter (~ 2700 bp) amplicons
To evaluate the performance of the two Pseudomonas-specific and two Z. tritici-specific amplicons, we conducted tests using an extensive mock community of well-characterized laboratory strains. In addition to the new amplicons, we amplified the full-length 16S and ITS loci to contrast performance on the same sample pools. Next, we tested the Pseudomonas and Z. tritici-specific amplicons on a diverse set of field-collected wheat leaf samples. The individual leaves were obtained from multiple winter wheat varieties across five timepoints during the growing season at an experiment field site near Zurich, Switzerland (Fig. 1B). To ensure optimal DNA recovery, we refined extraction protocols for bacterial and fungal DNA yield. Subsequently, we amplified the four taxon-specific amplicons—two Pseudomonas-specific and two Z. tritici-specific—in 30 mock and 480 wheat leaf samples. Each reaction was performed in triplicate, resulting in 9180 amplifications. Samples were uniquely tagged with asymmetric M13 barcode combination. Liquid handling was performed with a robot and samples were processed in a fully randomized layout. We sequenced all 9180 reactions on two PacBio Sequel II flow cells resulting in a total of 1.5 M high-quality reads used in all subsequent analyses (Supplementary Table 9).
Our analysis of 480 individual wheat leaf samples revealed a diverse assembly of bacteria. Based on the full-length 16S amplicon, we found Pseudomonas and Sphingomonas to be the dominant genera (Fig. 3A and Supplementary Fig. 3). The Pseudomonas-specific rpoD and transporter amplicons revealed a diverse assembly of species, primarily belonging to the P. fluorescens and P. syringae groups (Fig. 3B and Supplementary Fig. 3). The mycobiome, analyzed using the full-length ITS, was predominantly composed of the fungal genera Zymoseptoria and Cladosporium. While Z. tritici amplicons 1 and 2 were analyzed for strain-level diversity, we note that—unlike Pseudomonas ASVs, which resolve into distinct species—Z. tritici ASVs could not be assigned to strains due to the species’ high recombination rate. Even though individual Z. tritici ASVs do not provide a strong association to genome-wide diversity polymorphism in recombinant field populations, the diversity detected at the ASVs provides a strong estimator of overall diversity of the species. We focused on genus-level resolution, which is broadly consistent with recent findings in wheat fields [27, 29]. However, our dataset showed a higher abundance of Cladosporium. This could be attributed to the humid conditions during our sampling season or the limited resolution of the order Capnodiales in other amplicon studies on wheat leaves. These studies all analyzed the short ITS1 amplicon (approximately 350 bp), compared to our use of the full-length ITS amplicon (approximately 1500 bp) (Fig. 3C and Supplementary Fig. 3).Fig. 3. Microbial diversity on wheat leaves. A Relative abundances of bacterial genera on wheat leaves determined by the full-length 16S amplicon. B Relative and mean abundances of Pseudomonas species on wheat leaves assessed by the Pseudomonas-specific rpoD and transporter amplicons. C Relative abundances of fungal genera on wheat leaves determined by the full-length ITS amplicon. For all panels, the most abundant species are color-labeled, while less abundant species are shown in grey. Analyzed are 480 wheat leaves from eight different cultivars and five different timepoints across the growing season
Detection limits of taxon-specific amplicons
To assess the performance and detection limits of the new amplicons, we analyzed both defined mock communities and dilution series. For the Pseudomonas amplicons, we examined a panel of ten isolates representing the phylogenetic diversity of the genus (Supplementary Table 1). Our results showed that both the full-length 16S and the Pseudomonas-specific rpoD and transporter amplicons correctly distinguish all isolates (Fig. 4A). All isolates consistently amplified for all three amplicons, except for the P. putida Leaf58 isolate failing for the transporter amplicon. No genome sequence is available for verification, but amplification failure may be caused by primer mismatches. The rpoD and transporter amplicons exhibited clustering of the eight P. fluorescens group isolates according to their subclade described by Flury et al. 2016 [75]. While the rpoD and transporter amplicons predominantly produced a single amplicon sequence variant (ASV) per culture, the 16S amplicon exhibited multiple ASVs per culture. This can be attributed to the multi-copy nature of the 16S gene that is variable among the different Pseudomonas species analyzed here (Supplementary Table 1) [76, 77].Fig. 4. Detection limits of taxon-specific amplicons using mock communities. A Phylogenetic trees of amplified sequence variants (ASVs) from ten individual Pseudomonas cultures based on the full-length 16S amplicon, the Pseudomonas-specific rpoD and transporter amplicon. The Pseudomonas cultures originate from the following strains: P. syringae Leaf129 (Syringae L129), P. thivervalensis PITR2 (Fluo S2 PITR), P. putida Leaf58 (Putida L58), P. chlororaphis ssp. piscium PCL1391 (Fluo S1 PCL), P. fluorescens type strain DSM50090 (Fluo S3 DSM), P. chlororaphis ssp. chlororaphis type strain LMG5004 (Fluo S1 LMG), Pseudomonas sP. 1.31 (Fluo S2 sp1.31), Pseudomonas sP. SS101 (Fluo S3 SS101), P. protegens CHA0 (Fluo S1 CHA0), and P. brassicacearum TM1A3 (Fluo S2 TM1A3). For the isolates from the P. fluorescens group, the subclades Fluo S1, S2, and S3 are indicated as defined by Flury et al. (2016). Colors indicate different cultures, and the dot size corresponds to the abundance of the ASVs for each of the two replicates. The transporter amplicon failed to amplify the P. putida Leaf58 isolate. B Mixed equimolar combinations of the ten Pseudomonas cultures showing read counts of the main ASV for each mixture. The three replicates of each equimolar mixture are shown individually. C Top panel: Serial dilution series of P. syringae Leaf129 (green) mixed into P. thivervalensis PITR2 (grey). Read numbers are shown for the full-length 16S, and the Pseudomonas-specific rpoD and transporter amplicons. The expected read numbers of the diluted P. syringae Leaf129 isolate are indicated in light grey and calculated from the background read numbers of P. thivervalensis PITR2. Bottom panel: Serial dilution series of Z. tritici ST99CH 1E4 (green) in Z. tritici ST01IR 48b (grey) isolates. Read numbers are shown for the full-length ITS, and the Z. tritici-specific amplicon1(on chromosome13) and amplicon 2 (on chromosome 9). The expected read numbers of the diluted Z. tritici ST99CH 1E4 isolate are indicated in light grey and calculated from the background read numbers of the Z. tritici ST01IR 48b isolate. D Top panel: Read numbers of a serial dilution series of P. thivervalensis PITR2 (green) mixed into a wheat leaf DNA extract background for the full-length 16S amplicon and the Pseudomonas-specific rpoD and transporter amplicons. In grey, the total read count of the wheat-endemic bacterial isolates is indicated for the 16S amplicon, respectively the total read count of the wheat-endemic Pseudomonas for the Pseudomonas-specific rpoD and transporter amplicons. Bottom panel: Read numbers of a serial dilution series of Z. tritici ST99CH 1E4 (green) mixed into a wheat leaf DNA extract background for the full-length ITS and the Z. tritici-specific amplicons. In grey, the total read count of the wheat-endemic fungal isolates is indicated for the ITS amplicon, respectively the total read count of the wheat-endemic Z. tritici for the Z. tritici-specific amplicons. For all serial dilutions of (C) and (D), dilutions were performed with an initial DNA input of 7.5 ng per isolate. We maintained a consistent amount of input DNA for amplification across all samples and pooled the amplified products equimolarly for each amplicon. Consequently, total read numbers per sample remained constant across each dilution series. To compensate for PacBio sequencing’s tendency to produce more reads for shorter fragments, we pooled long taxon-specific amplicon products with shorter amplicons at a ratio of 2:1
Following the assessment of taxon-specific amplicon performance on single isolates, we evaluated performance in strain mixtures. For this, we created equimolar mixtures containing two to ten closely related Pseudomonas isolates to create a challenging scenario for the amplicons to discriminate. We found that the 16S, the rpoD, and the transporter amplicons successfully recovered ASVs from individual isolates even in complex mixtures (Fig. 4B). However, the rpoD amplicon showed reduced amplification efficiency for the isolate P. chlororaphis subsp. piscium PCL1391 and P. fluorescens SS101 in the mixture. The transporter amplicon exhibited lower amplification efficiency for the isolate P. fluorescens SS101. Furthermore, we observed greater variation in abundance between replicates for the 16S amplicon compared to the rpoD and transporter amplicons.
We prioritized very high levels of multiplexing to maximize sample throughput, which—given fixed sequencing capacity—resulted in relatively low read counts per sample. To assess detection limits in our experimental setup, we analyzed a two-strain dilution series consisting of the P. syringae Leaf129 and P. thivervalensis PITR2 isolates (Fig. 4C, Supplementary Fig. 4, Supplementary Table 12). We maintained a consistent amount of input DNA for amplification across all samples and pooled the amplified products equimolarly for each amplicon. To compensate for PacBio sequencing’s tendency to produce more reads for shorter fragments, we pooled long taxon-specific amplicon products with shorter amplicons at a ratio of 2:1. Both Pseudomonas-specific amplicons were able to identify the diluted isolates at a concentration as low as 10^−3^, which corresponds to 7.510^−3^ ng input DNA or approximately 1050 cells (Fig. 4C). Similarly, we performed a comparable analysis for the fungal pathogen Z. tritici using a two-strain dilution series comprising Z. tritici isolates ST99CH 1E4 and ST01IR 48b (Fig. 4C, Supplementary Fig. 4, Supplementary Table 12). The ITS and Z. tritici-specific amplicons successfully detected diluted strains down to a concentration of 10^−2^, corresponding to 7.510^−2^ ng input DNA or approximately 1720 cells. Polyphenols contained in plant DNA extracts can inhibit amplification. To test for this, we replicated the dilution series by diluting reference isolates with leaf samples. We obtained similar amplification yields from culture samples compared to plant samples indicating that the remaining plant extracts did not significantly affect the amplification (Fig. 4D, Supplementary Fig. 4, Supplementary Table 12).
Discriminant power of amplicons to resolve strain genotypes
To assess gains in phylogenetic resolution of the wheat microbiome, we compared the Pseudomonas and Z. tritici amplicons to 16S and ITS amplicons, respectively. Analyzing 480 wheat leaf samples, Pseudomonas-specific amplicons revealed in total 933 and 538 ASVs at the rpoD and transporter locus, respectively. In contrast, the 16S amplicon revealed only 86 ASVs matching the genus Pseudomonas (Fig. 5 A and B, Supplementary Fig. 5). This represents a three-fold (2.7X for rpoD and 3.3X for the transporter) increase in ASVs for the Pseudomonas-specific amplicons compared to the 16S and based on relative number of reads per amplicon (Fig. 5B).Fig. 5. Phylogenetic resolution of taxon-specific and ribosomal DNA amplicons. Taxon-specific amplicon analysis based on 480 wheat leaf samples. A Phylogenetic trees of Pseudomonas amplified sequence variants (ASVs) of the full-length 16S amplicon as well as the Pseudomonas-specific rpoD and transporter amplicons. ASVs are colored by Pseudomonas species. B Total number of ASVs detected by the full-length 16S amplicon, and the Pseudomonas-specific rpoD and transporter amplicons relative to the total read count per amplicon. C Proportion of ASVs with a BLASTn bitscore above a threshold for Pseudomonas species assignment. The 50% and 95% confidence intervals based on permutations are shown in grey. D Identification of the smallest subtree containing > 90% of the ASVs for each Pseudomonas species shown in (E) and (F). Only species with > 5 ASVs were considered. E Proportion of species in the smallest subtree containing > 90% of the species ASVs assigned to specific species. F Shannon diversity indices of the subtree species composition for the smallest subtree containing > 90% of the ASVs assigned to specific species. G Total number of ASVs detected by the full-length ITS amplicon and Z. tritici-specific amplicons 1 and 2, relative to the total read count per amplicon. H Phylogenetic networks (Splitstree) of Z. tritici ASVs identified using the full-length ITS amplicon and the Z. tritici-specific amplicons
We assigned all ASVs to Pseudomonas species using 1071 available genomes from nine different groups for BLASTn analyses (Supplementary Table 11). The Pseudomonas-specific amplicons produced significantly better matches for species assignment compared to 16S sequences (Fig. 5C). To assess the consistency of species assignments, we extracted the smallest subtree containing > 90% of the ASVs for each Pseudomonas species and examined the proportion of the species within this subtree, as well as the Shannon diversity index of the ASVs matching each species (Fig. 5D–F). Pseudomonas-specific amplicons showed higher proportions and lower Shannon diversity indices compared to the 16S as expected for more accurate species assignments. For this analysis, we considered four species for the 16S, 14 for rpoD, and ten for the transporter amplicon, where the low species number for 16S stems from the requirement of at least five detected ASVs per species (Supplementary Fig. 6). Among the species resolved by the rpoD and the transporter amplicons, but failing with the 16S, were subspecies of P. chlororaphis and various species from the P. syringae group (Fig. 5A and Supplementary Fig. 5).
For Z. tritici, we found 869 and 757 ASVs for the chromosome 9 and 13 amplicon, respectively, compared to 13 ASVs obtained by ITS (Fig. 5G and H). Based on the total number of reads per amplicon, this represents a more than tenfold increase of ASVs for the Z. tritici-specific amplicons. These findings indicate a remarkably high strain diversity within a single field in agreement with previous studies [55, 78].
High-resolution tracking of natural plant colonization by pseudomonads
A major aim of high-resolution microbiome analyses is to resolve strain-level interactions. To assess the power of the newly developed amplicons, we performed a hierarchical sampling of 480 samples across space and time tracking expected microbiome shifts in a wheat field. Specifically, sampling was conducted at five different timepoints during the wheat growing season, from May (first node appearance) to July (prior to harvest). At each timepoint, leaves were sampled at three different canopy heights and from different wheat cultivars to capture developmental patterns of the plant.
To evaluate the consistency of Pseudomonas species abundance patterns based on either the rpoD or transporter amplicon, we calculated correlation coefficients across samples for each species (Fig. 6A, Supplementary Fig. 7 and Supplementary Table 13). The observed correlations were compared to null expectations derived from permutations. Using Spearman and Pearson correlation coefficients, we found that 68% and 77% of species correlations, respectively, exceeded the 95% confidence interval of the null distribution, indicating strong concordance between the two amplicons for most species. Notably, higher inter-amplicon correlations were not generally associated with higher total species abundance (Fig. 6A, Supplementary Fig. 7 and Supplementary Table 13). However, species with low correlation values (falling below the 95% confidence interval of the null expectation) consistently exhibited low total abundance (0–40th percentile) in at least one amplicon. Furthermore, we examined the relationship between correlation strength and within-species diversity (measured as the number of ASV variants per species). Species with low inter-amplicon correlations were typically represented by fewer than five ASVs, suggesting that limited sequence diversity may contribute to inconsistent abundance estimates.Fig. 6. Differential abundance of wheat phyllosphere Pseudomonas species across season and canopy height. A** Spearman correlation of Pseudomonas species abundance across samples between the Pseudomonas-specific rpoD and transporter amplicons. The grey dotted line indicates the 95% confidence interval from permutation-based null models (1000 iterations). Observed correlations exceeding this threshold suggest non-random community patterns. B Relative abundances of P. chlororaphis subspecies across time and canopy heights revealed by the Pseudomonas-specific rpoD and transporter amplicons
Next, we investigated species abundance changes across time and space. We identified distinct abundance patterns among closely related species groups and subspecies. For example, we identified variable abundance patterns throughout the season and across canopy heights in both amplicons for different subspecies of P. chlororaphis (Fig. 6B). P. chlororaphis subsp. aurofaciens and subsp. piscium were most abundant in July, whereas P. chlororaphis and P. chlororaphis subsp. chlororaphis were most abundant in May and June. P. chlororaphis subsp. aurofaciens, subsp. piscium and subs. chlororaphis were most prominent at the bottom of the canopy, whereas P. chlororaphis was most abundant on upper leaves.
Within-species diversity of a major wheat pathogen
Genetic diversity within pathogen species can underpin rapid breakdowns of fungicide efficacy or host resistance [79, 80]. However, how genotypic diversity changes over pathogen life cycles remains largely unknown [81, 82]. Here, we examined changes in genotypic diversity across the epidemic phase for the fungal pathogen Z. tritici using two species-specific amplicons (excluding ITS due to its low resolution, with only 13 detected ASVs). Our large-scale field experiment involved natural infection of leaves by a genetically diverse local Z. tritici population, allowing the epidemic to progress without intervention. Widespread Z. tritici infection was observed, with host damage serving as the primary indicator of disease severity. The two Z. tritici-specific amplicons showed consistent numbers of ASVs per leaf throughout the season, indicating minimal turnover in pathogen diversity within a single epidemic phase (Fig. 7A). Similarly, the number of ASVs did not differ significantly across different canopy heights (Kruskal–Wallis test, p > 0.05, Fig. 7A). Tracking individual ASVs across the season, we observed two distinct groups based on their abundance patterns (Fig. 7B): (1) a small number of highly abundant strains persisting throughout the entire growing season, and (2) numerous transient, low-abundance strains detected only at specific timepoints. Even though the two amplicons cannot be directly matched at the individual strain level due to the lack of reference genomes spanning both loci (except for a few reference strains), the consistent trends observed for both amplicons suggest that the microbial community dynamics can be robustly tracked.Fig. 7. Genotypic diversity of the wheat phyllosphere pathogen Z. tritici across season.** A** Number of unique Z. tritici amplicon sequence variants (ASVs) per leaf across timepoints, based on the Z. tritici-specific amplicons 1 (on chromosome 13) and 2 (on chromosome 9). B Heatmap of the 100 most abundant Z. tritici ASVs clustered based on their abundance pattern across the season
Pangenome-informed amplicon templates for additional taxa
We assessed the potential to generate high-resolution amplicons for other ecologically relevant taxa. We used the same principle as for Pseudomonas and Z. tritici to define a representative set of high-quality genomes to construct a pangenome, and to search for core regions conserved within the taxon. Such regions can then be systematically examined for potential amplicon sequences. We generated pangenomes for Rhizobia and Streptomyces following a systematic procedure to allow for comparison (Fig. 8 and Supplementary Fig. 8). We also investigated the important environmental fungal species Aspergillus fumigatus causing opportunistic human infections. The pangenome analyses defined 22 core regions for Rhizobia, 515 for Streptomyces, and 5107 for A. fumigatus, respectively. These findings indicate genomic regions that are suitable for primer design (Fig. 8 and Supplementary Fig. 8). The relatively small number of candidate core regions for Rhizobia correlates with its greater 16S rRNA gene diversity compared with Streptomyces and Pseudomonas. Additionally, all pangenome genes were categorized into orthogroups based on protein homology. Our analysis reveals that the proportion of orthogroups present in all isolates of the pangenome varies from 16 to 84% among the analyzed taxon-specific pangenomes. This range reflects the extent of diversity within the pangenome that our method can accommodate (Fig. 8 and Supplementary Fig. 8).Fig. 8. Pangenome and amplicon templates for the bacterial genera Pseudomonas, Rhizobia, and Streptomyces, as well as the fungal species Z. tritici and A. fumigatus. Pangenomes were each generated based on 19 representative genomes. For each pangenome, the average genome size in Mbp, the number of candidate core segments and the length-normalized ribosomal nucleotide diversity of the 16S or ITS regions is indicated. Genes were grouped into orthogroups (OG) based on protein homology for each pangenome. The total number of orthogroups, the percentage of genes included in orthogroups, the percentage of orthogroups that contain all isolates, and the percentage of orthogroups that contain isolate-specific genes are shown for each species group.
Discussion
We demonstrate that limitations in phylogenetic resolution of microbial community profiling can be overcome using systematically designed taxon-specific 3-kb amplicons. We find that the new loci provide species and strain-level insights into subsets of bacterial and fungal communities coexisting in the plant microbiome exceeding full-length 16S or ITS amplicons by up to an order of magnitude. To demonstrate the utility of our approach, we focus on Pseudomonas and Z. tritici as important interacting microbes in agricultural systems. The broader implications of this work extend beyond plant microbiomes though as taxon-specific amplicon sequencing offers a scalable and cost-effective solutions for resolving microbial diversity in diverse ecosystems, including human-associated microbiomes, environmental microbiomes, and industrial microbiomes.
The pangenome-informed design of amplicons for pseudomonads was optimized to capture diversity specifically and exclusively in this ubiquitous group of bacteria. Compared to full-length 16S amplicons, the new amplicons provide consistently higher confidence in species assignments across clades. The confidence in species assignments stems from the high degree of species resolved to monophyletic clades for most Pseudomonas species. This enhanced resolution enabled us to differentiate Pseudomonas at the subspecies level, which is not feasible using 16S alone. The longer 16S-ITS-23S rRNA operon is thought to offer higher resolution, in silico comparisons demonstrate that our novel taxon-specific amplicons achieve superior resolution even compared to the full rRNA operon. We also focused on the dominant wheat phyllosphere pathogen Z. tritici and achieved a strain-level resolution inaccessible with existing single-locus genotyping approaches. The recovered diversity reflects genotyping resolution typically only achievable by whole-genome or metagenomic sequencing [55, 78]. The main limitations of metagenomic analysis have been the inability to sufficiently remove host DNA before sequencing [7, 83]. This is especially critical for environmental samples where the microbial biomass is low and host contamination is high. These factors have limited metagenomic analyses to the most dominant strains in samples. Because relative abundance in a sample is not correlated with ecological relevance [84], taxon-specific amplicon assays are especially suitable for environments with a highly diverse microbiome and high amounts of host DNA.
We evaluated the discriminant power of the new bacterial and fungal amplicons by tracking microbial communities in the wheat phyllosphere. We contrasted the resolution to the universal barcoding loci 16S and ITS in samples covering different time points and canopy heights. The increased resolution indeed revealed species and strain-specific changes across time and canopy height inaccessible by ribosomal barcoding datasets. One noteworthy finding was the differential abundance patterns observed for the P. chlororaphis subspecies. Subspecies of P. chlororaphis have garnered agricultural interest due to their biocontrol potential, with several strains being utilized in commercial formulations [21, 85]. This potential is attributed to the production of antimicrobial metabolites, notably phenazines, which are not only linked to microbial antagonism but also plant defense upregulation [21, 85]. However, phenazine expression and regulation vary among P. chlororaphis subspecies. Therefore, monitoring P. chlororaphis at the subspecies level is crucial for analyzing antagonistic behavior in the field and prioritizing the most promising biocontrol candidates. The assay developed here is the first to facilitate this. Additionally, we were able to resolve genotypic diversity in a highly diverse fungal pathogen and track changes throughout a single epidemic season. The virulence of different Z. tritici isolates varies greatly, making it interesting to track the genotypic diversity across the epidemic season [86, 87]. Despite the overall large genotypic diversity (> 700 ASVs within a single field), we identified a small set of genotypes that dominated the entire season, which can be informative for control strategies. Our taxon-informed amplicon analyses could also facilitate the construction of strain-specific co-occurrence networks across taxonomic groups. These networks facilitate the interrogation of microbial community network structures and reveal likely ecological interactions, as well as prioritizing biocontrol candidates. However, the patterns of co-occurrence are significantly influenced by the sampling scheme and the network inference method applied [88–90]. Hence, future work should experimentally test synthetic communities to assess interaction profiles predicted from field survey data.
The replicated monitoring of Pseudomonas and plant pathogen diversity with two independent amplicons each enabled us to assess the reproducibility of species and strain identification as well as abundance estimates. We found that certain Pseudomonas species were over- or underrepresented in a particular amplicon and across samples. These discrepancies likely arise from differences in primer binding efficiency and variations in amplification success due to amplicon sequence variability [91]. Our findings align with previous studies demonstrating inconsistencies between different 16S amplicons [92]. Notably, we found that higher inter-amplicon correlation was not linked to higher total species abundance, suggesting that even less abundant species exhibited reproducible detection. The combined use of two independent 3-kb amplicons for Pseudomonas, supplemented by 16S ribosomal DNA controls, establishes a powerful toolset to track the reproducibility of microbiome community assessments across samples and taxonomic groups. This approach underscores the importance of implementing technical controls in quantitative microbiome studies to distinguish biological variation from methodological artifacts.
In our proof-of-principle demonstration, we multiplexed nearly 10,000 reactions in a single sequencing run. This achievement was made possible by recent improvements in Sequel II. The quality of the sequences was high enough to recover sequences at an impressive rate using asymmetric barcoding, which allowed us to multiplex nearly 10,000 samples with the standard set of barcodes. Therefore, the level of multiplexing in this workflow can be determined solely by the number of reads desired per sample. With taxon-specific amplicons, the sample complexity is lower, making higher multiplexing desirable. Additionally, we enhanced the efficiency and reproducibility of sample handling through robotic liquid handling. The PacBio HiFi sequencing used in this study provides high-quality long reads with an accuracy exceeding 99% (Q20) after circular consensus sequencing. This accuracy is comparable to or better than that of Illumina sequencing, significantly reducing the need for conservative quality control measures [15].
The high quality of PacBio HiFi reads enables detailed microbial community profiling using long amplicons. With many high-quality genomes available, we can leverage pangenomes for more integrative analyses of genetic diversity within taxonomic groups and identify the most informative amplicons. Despite the increasing use of pangenomes to analyze genetic diversity, they have not been utilized for amplicon design to the best of our knowledge [93–95]. With the rapidly growing number of genomes, this approach has broad applicability. We demonstrated this by calculating the pangenome for the bacterial groups Rhizobia and Streptomyces, as well as for the fungal species A. fumigatus, and identified an array of taxon-specific candidate regions suitable for primer design. Future work could systematically examine the consistency of candidate regions across different genera by analyzing their diversity ranks relative to taxonomic divergence thresholds. Such cross-genera validation would help determine whether the most informative loci identified here generalize to other microbial groups, further refining the framework’s spectrum of applications. A recent study on Streptomyces taxonomy found that 16S rRNA sequence variation does not reliably delineate Streptomyces species [96]. The study advocates for alternative markers and suggests a reclassification of Streptomyces taxonomy, historically based on 16S data. Rhizobia is a highly diverse bacterial group characterized by its ability to fix atmospheric nitrogen. Species delineation often involves multilocus sequence analysis using housekeeping genes [97–99]. Longer, high-resolution amplicons could track community dynamics in soil and root nodules, revealing species competitiveness in nodule occupancy. A. fumigatus is both an opportunistic human fungal pathogen as well as an environmental saprobe. Despite extensive genomic analysis of the species, the diversity in environmental versus clinical niches is still poorly understood [100, 101]. Similar to other fungal species, metagenomic analysis for A. fumigatus remains challenging due to the small fraction of fungal DNA in samples and the significantly larger genome compared to bacteria [102]. High-resolution taxon-specific amplicons could provide a suitable approach to gain strain resolution of environmental and clinical samples, enhancing our understanding of strain competitiveness across different environments. Our open-source bioinformatics pipeline, encompassing pangenome construction and diversity-based amplicon design, simplifies the design process for the community. While a fully automated tool for taxon-specific amplicon design would further enhance accessibility, several challenges persist. These include the crucial step of selecting representative genomes, which necessitates expert judgment beyond the scope of automated software, and the inherent difficulties in automating the inspection of amplicon candidates and the design of degenerate primers.
Overall, our approach yielded similar costs per high-quality sequence as the commonly used 16S and ITS Illumina-based amplicon sequencing. This gain in throughput opens cost-effective avenues to explore microbial community dynamics with complex experimental designs and in heterogeneous environments. Furthermore, designing, optimizing, and multiplexing long amplicons for other ecologically relevant taxa is clearly in reach with the availability of high-quality genomes. In conclusion, our work shows that long-read amplicon sequencing based on purpose-designed taxon-specific amplicons overcomes limitations in phylogenetic resolution associated with ribosomal amplicon sequencing.
Supplementary Information
Supplementary Material 1.Supplementary Material 2.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Song S, Liu Y, Wang NR, Haney CH. Mechanisms in plant-microbiome interactions: lessons from model systems. Curr Opin Plant Biol. 2021 [cited 2022 Jun 5];62. Available from: https://pubmed.ncbi.nlm.nih.gov/33545444/.10.1016/j.pbi.2021.10200333545444 · doi ↗ · pubmed ↗
- 2Xu L, Pierroz G, Wipf HML, Gao C, Taylor JW, Lemaux PG, et al. Holo-omics for deciphering plant-microbiome interactions. Microbiome. 2021 [cited 2022 Jun 5];9. Available from: https://pubmed.ncbi.nlm.nih.gov/33762001/.10.1186/s 40168-021-01014-z PMC 798892833762001 · doi ↗ · pubmed ↗
- 3Choi K, Khan R, Lee SW. Dissection of plant microbiota and plant-microbiome interactions. J Microbiol. 2021 [cited 2022 Jun 5];59:281–91. Available from: https://pubmed.ncbi.nlm.nih.gov/33624265/.10.1007/s 12275-021-0619-533624265 · doi ↗ · pubmed ↗
- 4Trivedi P, Leach JE, Tringe SG, Sa T, Singh BK. Plant–microbiome interactions: from community assembly to plant health. Nat Rev Microbiol. 2020;18:607–21. Available from: 10.1038/s 41579-020-0412-1.10.1038/s 41579-020-0412-132788714 · doi ↗ · pubmed ↗
- 5Liu YX, Qin Y, Chen T, Lu M, Qian X, Guo X, et al. A practical guide to amplicon and metagenomic analysis of microbiome data. Protein Cell. 2021 [cited 2022 May 25];12:315–30. Available from: https://pubmed.ncbi.nlm.nih.gov/32394199/.10.1007/s 13238-020-00724-8PMC 810656332394199 · doi ↗ · pubmed ↗
- 6Hrovat K, Dutilh BE, Medema MH, Melkonian C. Taxonomic resolution of different 16S r RNA variable regions varies strongly across plant-associated bacteria. ISME Communications. 2024;4(1):ycae 03410.1093/ismeco/ycae 034PMC 1098083138559569 · doi ↗ · pubmed ↗
- 7Earl JP, Adappa ND, Krol J, Bhat AS, Balashov S, Ehrlich RL, et al. Species-level bacterial community profiling of the healthy sinonasal microbiome using Pacific Biosciences sequencing of full-length 16S r RNA genes. Microbiome. 2018 [cited 2022 May 24];6. Available from: https://www.pmc/articles/PMC 6199724/.10.1186/s 40168-018-0569-2PMC 619972430352611 · doi ↗ · pubmed ↗
- 8Wagner J, Coupland P, Browne HP, Lawley TD, Francis SC, Parkhill J. Evaluation of Pac Bio sequencing for full-length bacterial 16S r RNA gene classification. BMC Microbiol. 2016 [cited 2022 May 25];16:1–17. Available from: https://pubmed.ncbi.nlm.nih.gov/27842515/.10.1186/s 12866-016-0891-4PMC 510982927842515 · doi ↗ · pubmed ↗
