Natural variation of the streptococcal group A carbohydrate biosynthesis genes impacts host–pathogen interaction
Kim Schipper, Sara M. Tamminga, Nicholas Murner, Matthew Davies, Paul Berkhout, Debra E. Bessen, Astrid Hendriks, Natalia Korotkova, Yvonne Pannekoek, Nina M. van Sorge

TL;DR
This study explores how genetic variations in Streptococcus pyogenes affect its carbohydrate structure and interactions with the human host.
Contribution
The study provides a global analysis of genetic variation in the gacA-L cluster and identifies functional consequences of rare inactivating mutations in gacH.
Findings
Only 1.3% of S. pyogenes isolates had premature stop codons in gac genes.
gacH mutations led to loss of GacH function and altered host interactions.
Variants with gacH mutations showed resistance to human bactericidal enzymes and zinc toxicity.
Abstract
Streptococcus pyogenes is a leading cause of infection-related mortality in humans globally. The characteristic cell wall-anchored group A carbohydrate (GAC) is expressed by all S. pyogenes strains and consists of a polyrhamnose backbone with alternating N-acetylglucosamine (GlcNAc) side chains, of which 25% are decorated with glycerol phosphate (GroP). The genes in the gacA-L cluster are critical for GAC biosynthesis, with gacH-L being responsible for the characteristic GlcNAc–GroP decoration, which confers the agglutination in rapid test diagnostic assays and contributes to S. pyogenes pathogenicity. Historical research papers described S. pyogenes isolates, so-called A-variant strains, that lost the characteristic GlcNAc side chain following serial animal passage. Genomic analysis of a single viable historic parent/A-variant strain pair revealed a premature inactivating stop codon in…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5| Gene | Gene size* (nt) | Protein size (aa) | Alleles ( | Unique protein ( | Unique stop codon variant ( | Strains with a stop codon in |
|---|---|---|---|---|---|---|
|
| 855 | 284 | 122 | 62 | 1 | 1 (0.05) |
|
| 1,155 | 384 | 173 | 108 | 1 | 1 (0.05) |
|
| 933 | 311 | 134 | 67 | 1 | 1 (0.05) |
|
| 804 | 267 | 90 | 35 | 1 | 1 (0.05) |
|
| 1,206 | 402 | 184 | 64 | 3 | 3 (0.15) |
|
| 1,008 | 335 | 101 | 52 | 1 | 1 (0.05) |
|
| 1,746 | 582 | 224 | 134 | 2 | 2 (0.1) |
|
| 2,475 | 824 | 284 | 185 | 6 | 12 (0.6) |
|
| 696 | 231 | 85 | 40 | 2 | 2 (0.1) |
|
| 342 | 113 | 33 | 13 | 0 | 0 |
|
| 1,287 | 428 | 135 | 75 | 1 | 1 (0.05) |
|
| 1,497 | 498 | 180 | 101 | 3 | 3 (0.15) |
| Id | Isolate | Country | Continent | Year∗ | Primary_disease† |
| ST (MLST) |
| nt position‡ | aa (full/trunc)§ |
|---|---|---|---|---|---|---|---|---|---|---|
| 6056 | Manfredo | USA | North America | 1952 | ARF | 99 |
| 502 | 167/284 | |
| 4281 | NS53 | Australia | Oceania | 1991 | Skin/soft tissue infection, NOS | 318 |
| 838 | 311/231 | |
| 6085 | MEW427 | USA | North America |
|
| 39 |
| 694 | 267/100 | |
| 4285 | NS80 | Australia | Oceania | 1992 | Skin/soft tissue infection, NOS | 1063 |
| 301 | 402/50 | |
| 4289 | NS3 | Australia | Oceania | 1990 | Invasive, NOS | 1076 |
| 151 | 402/89 | |
| 4521 | GAS_dogM3 | Australia | Oceania | 1990 |
| 10 |
| 268 | 402/128 | |
| 4782 | 33022V1T1 | Fiji | Oceania | 2006 | Pharyngitis and/or tonsillitis | 320 |
| 385 | 335/144 | |
| 5413 | K8955 | Kenya | Africa | 2002 | Skin/soft tissue infection, NOS | 778 |
| 433 | 582/167 | |
| 5442 | K3525 | Kenya | Africa | 1998 | Invasive, NOS | 100 |
| 502 | 582/182 | |
| 4791 | 33131V1T1 | Fiji | Oceania | 2006 | Pharyngitis and/or tonsillitis | 1013 |
| 547 | 824/14 | |
| 4788 | 33112V1T1 | Fiji | Oceania | 2006 | Pharyngitis and/or tonsillitis | 1024 |
| 43 | 824/309 | |
| 4797 | 33181V1T1 | Fiji | Oceania | 2006 | Pharyngitis and/or tonsillitis | 268 |
| 928 | 824/309 | |
| 5276 | K45527 | Kenya | Africa | 2010 | Invasive, NOS | 724 |
| 928 | 824/309 | |
| 5500 | K4656 | Kenya | Africa | 1999 | Skin/soft tissue infection, NOS | 727 |
| 928 | 824/312 | |
| 4521 | GAS_dogM3 | Australia | Oceania | 1990 |
| 10 |
| 937 | 824/329 | |
| 4285 | NS80 | Australia | Oceania | 1992 | Skin/soft tissue infection, NOS | 1063 |
| 988 | 824/753 | |
| 5206 | MTB313 | Japan | Asia |
| Meningitis | 28 |
| 2260 | 824/773 | |
| 4600 | NGAS015 | Canada | North America | 2012 |
| 297 |
| 2320 | 824/773 | |
| 4790 | 33129V1T1 | Fiji | Oceania | 2006 | Pharyngitis and/or tonsillitis | 1078 |
| 2320 | 824/773 | |
| 5136 | STAB901 | France | Europe | 2009 | Invasive, NOS | 178 |
| 2320 | 824/773 | |
| 6132 | 20162146 | USA | North America | 2015 | Invasive, NOS | 101 |
| 2320 | 824/773 | |
| 6215 | 20155615 | USA | North America | 2015 | Invasive, NOS | 308 |
| 2320 | 824/773 | |
| 5459 | K5851 | Kenya | Africa | 2000 | Invasive, NOS | 755 |
| 424 | 231/141 | |
| 5454 | K17786 | Kenya | Africa | 2006 | Meningitis | 251 |
| 460 | 231/153 | |
| 5205 | M1_476 | Japan | Asia | 1994 | Invasive, NOS | 28 |
| 103 | 428/34 | |
| 5207 | MTB314 | Japan | Asia |
| Meningitis | 28 |
| 106 | 498/35 | |
| 4336 | NS488 | Australia | Oceania | 1995 | Invasive, NOS | 1019 |
| 682 | 498/227 | |
| 4785 | 33087V1T1 | Fiji | Oceania | 2006 | Pharyngitis and/or tonsillitis | 176 |
| 1204 | 498/401 |
- —http://dx.doi.org/10.13039/100010269 Wellcome Trust
- —http://dx.doi.org/10.13039/100006492 Division of Intramural Research, National Institute of Allergy and Infectious Diseases
- —http://dx.doi.org/10.13039/501100001826 ZonMw
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStreptococcal Infections and Treatments · Antimicrobial Resistance in Staphylococcus · Plant Pathogenic Bacteria Studies
Data Summary
All Streptococcus pyogenes genome sequences used for this analysis are available within the publication by Davies et al. (2019), ‘Atlas of group A streptococcal vaccine candidates compiled using large-scale comparative genomics’ Nature Genetics, 51(6): 1035–43.
The raw reads for A-variant strain D315/87/3 (emm58; [1]) are available in Sequence Read Archive under the project number PRJNA1234579. The complete genome sequence of S. pyogenes strain D315 (also referred to as NCTC10876) is available through GenBank accession: LS483360.
Introduction
*Streptococcus pyogenes *is a human-restricted pathogen, responsible for hundreds of millions of infections globally each year [23]. S. pyogenes causes a wide spectrum of clinical manifestations ranging from pharyngitis and impetigo to invasive infections, e.g. puerperal sepsis, necrotizing fasciitis and streptococcal toxic shock syndrome. Moreover, repeated infections can result in the development of post-infectious sequelae such as glomerulonephritis and rheumatic heart disease. Together, infections by this single pathogen result in an estimated 500,000 deaths annually worldwide, with resource-limited areas and Indigenous populations being affected disproportionately [24]. Developing new strategies for effective treatment and prevention of S. pyogenes infection and its complications remains a critical public health priority, as safe and effective vaccines are not yet available.
An important feature of the cell wall of S. pyogenes is the group A carbohydrate (GAC) [5]. The GAC is expressed by all S. pyogenes strains, which has resulted in the application of GAC-reactive rapid diagnostic test kits for streptococcal group identification. The GAC glycopolymers comprise up to half of the cell wall mass and are composed of a linear polyrhamnose chain decorated with N-acetylglucosamine (GlcNAc) side chains [67]. Recently, researchers showed that 25% of GlcNAc side chains are further modified with negatively charged glycerol phosphate (GroP) moieties [8]. GAC is important for the structural integrity of the bacterial cell wall, and biosynthesis of the polyrhamnose backbone is essential for the viability of S. pyogenes [6,911]. Removal of the GlcNAc–GroP side chain resulted in increased in vitro killing by human whole blood, neutrophils and platelet releasate and attenuated virulence in murine and rabbit infection models [12], whereas removal of just the GroP moiety resulted in resistance against human cationic antimicrobial proteins, human bactericidal enzyme group IIA-secreted phospholipase (hGIIa), lysozyme and histones, while increasing susceptibility to zinc [813].
The 12-gene gac cluster is crucial for GAC biosynthesis and exhibits limited genetic variation across the S. pyogenes population [121415]. The first seven genes of the cluster (gacABCDEFG) encode proteins catalysing the biosynthesis and transport of the polyrhamnose backbone [91216]. GacIJKL, encoded by gacIJKL, are involved in the decoration of the polyrhamnose backbone with the GlcNAc side chain. Finally, GacH is the GroP transferase enzyme that cleaves membrane phosphatidylglycerol to release and attach GroP to C-6 of the GlcNAc side chains [8]. Interestingly, historic research reported isolates, referred to as ‘A-variants’, in which GAC lost its characteristic GlcNAc side chain after serial passage in mice and rabbits [1718]. However, the underlying mechanisms responsible for the loss of the GAC GlcNAc side chain were never reported. Additionally, this A-variant phenotype was not detected among a stock collection of human isolates [18].
With the current availability of whole-genome sequences, we aimed to re-examine the concept that strains with a non-canonical GAC may arise in humans. To this end, we first analysed the genomic sequences of a historic S. pyogenes A-variant/parent strain pair to pinpoint the genetic alteration that could explain the loss of the GlcNAc side chain in the evolved A-variant strain. Additionally, we analysed the genetic variation of gacA-L in a collection of 2044 S. pyogenes genomes and identified potential inactivating mutations in gacH. We investigated the impact of these mutations on GacH function by complementing a gacH-deficient strain with plasmid-expressed truncated gacH variants and subsequent detection of GroP by biochemical, phenotypical and functional analysis.
Methods
Plasmids, bacterial strains and culture conditions
All plasmids and S. pyogenes strains used in this study for wet-lab experiments are listed in Table S1, available in the online Supplementary Material. *Escherichia coli *was grown in lysogeny broth (LB) (Oxoid) medium or LB agar plates, supplemented with 500 µg ml^−1^ erythromycin at 37 °C. S. pyogenes strains were grown on Todd Hewitt supplemented with 0.5% yeast (THY, Oxoid) agar plates or in THY broth supplemented with 5 µg ml^−1^ erythromycin for the plasmid-complemented strains. No antibiotics were added for the WT strains and the gacH knockout. Overnight cultures were grown at 37 °C, without CO_2_, subcultured the next day in fresh THY and grown to mid-exponential growth phase, corresponding to an OD at 600 nm (OD_600_) of 0.4.
Antibody ab9191 binding assay
S. pyogenes strains were grown to mid-log phase in THY (OD_600_ 0.4), centrifuged, resuspended in PBS containing 0.1 % BSA (Sigma) and incubated with 1 µg ml^−1^ ab9191, a goat polyclonal GAC antibody (Abcam), for 20 min at 4 °C. After washing, bacteria were resuspended in PBS 0.1 % bovine serum albumin (BSA) containing Protein G Alexa Fluor 488 conjugate (1:1,000, Thermo Fisher Scientific P11065). Fluorescence was analysed by FACSCanto II Flow Cytometer (BD Bioscience). Per sample, 10,000 gated events were collected, and fluorescence was expressed as the geometric mean.
Whole-genome sequence comparison of D315 and D315/87/3
The whole-genome sequences of an A-variant/WT strain pair available from the historical Lancefield streptococcal collection were compared to determine genomic changes underlying the loss of GlcNAc side chain in GAC. The A-variant strain D315/87/3 (emm58 [1];) was sequenced by the Genomics Technologies Facility in Lausanne, Switzerland, using NovaSeq 6000. The raw reads have been uploaded to Seqeunce Read Archive under the project number PRJNA1234579. Raw reads were processed with Trimmomatic v.039 [19] to remove adaptor sequences and bases of insufficient quality with parameters ILLUMINACLIP:TruSeq3-SE:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36. Trimmed reads were then de novo assembled into scaffolds with SPAdes v3.15.5 [20] and assembly quality was assessed with QUAST v5.0.2. SNP differences between parent strain D315 (also referred to as NCTC10876; GenBank accession: LS483360) and D315/87/3 were detected with Snippy v4.6.0 [21] (https://github.com/tseemann/snippy) with default parameters.
Group A Streptococcus analysis of the gac gene cluster
Whole-genome sequences (Illumina short-read) of 2,044 S. pyogenes isolates were uploaded to the open-access PubMLST database (www.pubmlst.org) [1422]. gacA-gacL were annotated and screened for genetic variation, where allele 1 was assigned to the reference strain MGAS5005 [23]. Strains of which one of the gac genes was truncated because it was located at the end of a contig (n=23) were excluded from the analysis. The sequence of allele 1 was blasted against the genomes of strains without an allele number using the PubMLST database to identify sequences with a premature stop codon. The default blast settings were used: blastn word size, 11; blastn scoring: reward, 2; penalty, −3; gap open, 5; gap extend, 2; hits per isolate, 1; flanking length (bp), 0.
A 14,416 bp region containing each gac gene, as well as the intergenic regions upstream of gacA, gacB and gacI, was detected in each isolate with blastn [24] against reference genome MGAS5005 (accession number: CP000017) with default parameters. Alignment start and end region and subject strand information were used to extract sequences with bedtools getfasta [25], reverse complementing when located on the antisense strand. A multiple sequence alignment was constructed with mafft [26] and input into snp-sites to determine polymorphic regions [27]. The resulting variant call format file was annotated with snpEff [28] and its pre-built database Streptococcus_pyogenes_mgas5005 to determine synonymous and non-synonymous SNPs of coding regions.
Generation of complemented S. pyogenes ΔgacH strains with gacH variant
All primers used in this study are listed in Table S2. Expression vector pDCerm_gacH (gacH allele 1) was isolated from E. coli MC1061. The plasmid was digested with EcoRI and BglII. For complementation of the S. pyogenes ΔgacH knockout strain with the different gacH variants, different strategies were used. To create gacH complementation plasmids that contained a premature stop codon at nucleotide nt 928 and 937, sequences were codon optimised, and gblocks were ordered. The gblocks were amplified with primers STOP309AAF/ STOP309AAR or STOP312AAF/STOP312AAR, digested with EcoRI and BglII and ligated into EcoRI/BglII-digested pDCerm. Correct insertion was confirmed by Sanger sequencing with primers pDCermF, pDCermR, gacH309_checkF1, gacH309_checkR1 or pDCermF, pDCermR, STOP312checkF and STOP312checkR. For complementation with gacH containing a premature stop codon at nt position 2320, a gblock could not be generated, even after codon optimisation. Therefore, WT S. pyogenes strain 20162146, which was identified to contain this gacH stop codon, was obtained from the Centers for Disease Control (Atlanta, GA, USA). The gacH gene was amplified with primers gacHEcoRIF and gacHBglIIR, followed by a similar procedure as described for the gblocks. The correct insert was confirmed with primers gacHcheck1, gacHcheck2, gacHcheck3, pDCermF and pDCermR. Plasmids with the correct inserts were transformed into S. pyogenes 5448ΔgacH.
Measurement of relative phosphate concentrations on GAC
S. pyogenes cell wall was isolated from late exponential phase cultures (OD_600_~0.8) by the SDS-boiling procedure as previously described [1329]. Purified cell wall samples were lyophilized and stored at −20 °C before the analysis. GAC was released from cell wall preparations by mild acid hydrolysis as previously described [13]. After hydrolysis, samples were purified by running over a PD-10 desalting column (VWR, 17-0851-01), with de-ionized water as the exchange buffer. Phosphate was released from GAC as outlined [8]. Briefly, soluble GAC was incubated with 2 N HCl at 100 °C for 2 h to cleave GroP. Samples were neutralized with NaOH in the presence of 62.5 mM HEPES pH 7.5. To release phosphate from GroP, samples (100 µl) were incubated with 2 µl of 1 U µl^−1^ alkaline phosphatase (New England Biolabs; Quick CIP) in alkaline phosphatase buffer (New England Biolabs; rCutSmart buffer) at 37 °C overnight. Phosphate concentrations were measured using the malachite green method. The reactions were diluted to 160 µl with water, and 100 µl was transferred to a flat-bottom 96-well culture plate. Malachite Green reagent (0.2 ml) was added, and the absorbance at 620 nm was read after 10 min at room temperature. Malachite Green reagent contained one volume of 4.2% ammonium molybdate tetrahydrate (by weight) in 4 M HCl, three volumes of 0.045% malachite green (by weight) in water and 0.01% Tween 20. Phosphate concentrations were determined using a phosphate standard curve. Concentrations of rhamnose (Rha) in GAC were measured by an anthrone assay as previously described [13]. The concentrations of phosphate were normalized to total Rha content and presented as a percentage of the ratios in the WT strain.
Human group IIA-secreted phospholipase A2 killing assay
hGIIa killing was determined as described before [30]. Briefly, S. pyogenes strains were grown to mid-log phase (OD_600_ 0.4), diluted 1,000-fold in HEPES solution [20 mM HEPES, 2 mM Ca^2+^, 1 % BSA (pH 7.4)] with or without 0.5 µg ml^−1^ hGIIa. Samples were incubated for 2 h at 37 °C and viability was assessed using serial dilutions plating on blood agar plates. The next day, colony forming units (c.f.u.) were counted. The percentage survival was calculated as follows:
Zinc susceptibility assay
S. pyogenes strains were grown to mid-log phase in THY; cultures were diluted in fresh THY to an OD_600_ of 0.1. An equal volume of 2.5 mM ZnSO_4_ in THY was added in a 96-well plate (technical duplicates), and cultures were incubated at 37 °C. After 20 h, serial dilutions were plated for c.f.u. determination.
Statistical analysis
Flow cytometry data were analysed using FlowJo v.10 (FlowJo). Data were analysed using GraphPad Prism 10.2.0 (GraphPad software). Statistical significance was tested using an unpaired t-test or a one-way ANOVA, followed by a Dunnett’s test for multiple comparisons. The P values are depicted in the figures or mentioned in the caption, and P<0.05 was considered significant.
Results
We obtained a viable parent strain (D315) and animal-passaged A-variant strain (D315/87/3) from the historical Lancefield streptococcal collection (Table S1). The A-variant phenotype of strain D315/87/3 was confirmed by the absence of StrepTex latex agglutination (not shown) and loss of binding of a GAC-GlcNAc reactive polyclonal antibody (Figs 1 and S1). Next, we compared whole-genome sequences of these strains to pinpoint the underlying genetic defect. We identified 141 SNPs, of which 115 were identified in coding regions and were disruptive, i.e. premature stop codon or switch to non-synonymous or frameshift mutations (Table S3). Within the gac cluster, we identified a nucleotide substitution C382T in the gacI sequence of D315/87/3 (A-variant). GacI is the glycosyltransferase encoded by gacI, which is critical for expression of the GAC GlcNAc side chain [1231]. This mutation resulted in a premature stop codon at nt position 382, leading to a truncated protein of 127 aa instead of 231, thereby reducing GacI size by 55%, which likely results in a non-functional protein. Therefore, the identified gacI mutation likely explains the absence of GlcNAc in the historic mouse-passaged A-variant D315/87/3.
*Historical A-variant lacks GlcNAc side chain. Binding of goat polyclonal S. pyogenes GAC antibody ab9191 (1 µg ml−1) to WT S. pyogenes M3, an isogenic gacI mutant, the historic A variant (D315/87/3) and its parent strain (D315). Data are depicted as geometric mean fluorescence intensity (FI) of three individually displayed biological replicates (mean+sd). P values were calculated by two unpaired t-tests; *P<0.01.
To comprehensively analyse the genetic variation in gacA-L, we analysed allelic variation and the presence of specific mutations in these genes in 2,021 S. pyogenes genomes (Table S4) [14]. The number of nucleotide alleles varied between 33 (gacJ) and 284 (gacH) (Table 1). The 14,416 bp region, spanning the gac operon and the 137 bp upstream intergenic regions, contained 1,629 SNPs across 1,568 unique gac positions, with 61 being polymorphic (Table S5). Across non-M1 strains, an average of 57 SNPs per gac cluster per strain were detected relative to the MGAS5005 reference genome; 796 out of 1,629 (48.9%) SNPs were non-synonymous; 745 out of 629 (45.7 %) SNPs were synonymous; and 89/1,629 (5.5 %) were identified in intergenic regions (Fig. 2 and Table S5). Two promoters for the gac gene cluster have been described and confirmed through RNA-sequencing, i.e. one upstream of gacA (nt 60,4738–60,4788 in MGAS5005) and one upstream of gacB (nt 60,5767–60,5817 genome MGAS5005) [32]. We observed 11 different mutations in the promoter region of gacA in 35 (1.7 %) out of 2021 isolates. Only one of these mutations (isolate K41948) was in the predicted −10 region (ATGAAA→ATGAAG). Two mutations were found in the spacer region (isolates NGAS130 and Bra36), and none were found in the −35 region. For the promoter upstream of gacB, 10 different mutations in 88 (4.4 %) out of 2021 strains were identified. None of these mutations were found in the −10, spacer nor −35 predicted regions. Overall, and as expected, the gac gene cluster was genetically highly conserved in the S. pyogenes population.
SNP sites of the gac operon across 2,021 S. pyogenes genome sequences. The 14,416 bp, 12-gene gac operon, including 137 bp upstream, intergenic regions of reference genome MGAS5005 (accession: CP000017) are represented at the bottom. Relative positions of 1,629 SNPs at 1,568 polymorphic sites are represented by bars across the x-axis, with their height representing their frequency across 2,021 S. pyogenes genome sequences. SNPs are categorized into three groups relative to MGAS5005: synonymous mutations resulting in no amino acid change (upper graph), nonsynonymous mutations resulting in an amino acid change (middle graph) and SNPs detected in intergenic regions (bottom graph).
Both the number of alleles and unique proteins of each gac gene correlated strongly with gene size (Fig. 3a, b). gacH displayed the highest number of unique protein sequences (Table 1 and Fig. 3b). Interestingly, for all gac genes except gacJ, we identified variants that contained a premature stop codon (Tables 2 and S5). Overall, we identified 26 isolates (1.3% of all analysed strains) that contained a premature stop codon in one or two of the gac genes. Three isolates (NS534, MTB313 and STAB901) were observed to contain a nucleotide deletion in gacI at position 449, resulting in a premature stop codon. However, Sanger sequencing gacI of STAB901 [33] did not confirm this nucleotide deletion and likely resulted from a sequencing error in a poly(A) tract of 8 nt. Therefore, we did not include this particular gacI ‘stop codon’ in Table 2. In addition, we observed a deletion in gacA at position 414, resulting in a premature stop codon at nucleotide position 502 in isolate Manfredo (Table 2). We confirmed the presence of this mutation by Sanger sequencing. Although this deletion predicts a truncated protein consisting of only 167 of the 284 aa (59%) [9], this strain is still positive in the rapid test agglutination assay (data not shown), suggesting a functional GacA. None of the other premature stop codons were analysed functionally due to a lack of the specific strains.
Unique allelic and protein variants of gacA-L in 2,021 S. pyogenes isolates. The number of (a) unique alleles in and (b) unique protein sequences encoded by each of the gacA-L genes and correlation with gene size.
For 12 (46 %) of the 26 strains, a premature stop codon was found in gacH (Table 2). These isolates did not cluster based on emm typing nor multi-locus sequence typing (MLST). Furthermore, these strains were isolated from different disease manifestations (pharyngitis n=3, skin/soft tissue infection n=2, invasive infection n=4, meningitis n=1 and unknown n=2) and originated from different continents (Oceania n=5, Africa n=2, Asia n=1, North America n=3 and Europe n=1).
GacH modifies the GAC GlcNAc side chain with GroP [8] and consists of a transmembrane domain (Fig. 4; aa 1–395, green) and a catalytic domain on the extracellular side of the membrane (Fig. 4; aa 444–822, red). The premature stop codons were located at nucleotide position 43 (n=1), 928 (n=3), 937 (n=1), 988 (n=1), 2,260 (n=1) and 2,320 (n=5), resulting in truncated proteins of 14, 309, 312, 329, 753 and 773 aa long (Fig. 4). We also checked for the presence of mutations that flank the catalytic site as well as the sites that are important for binding ligand and substrate based on structural information in [8]. We did not identify any amino acid substitutions in these positions, indicating that these sites are 100% conserved.
In silico analysis of premature stop codons in gacH. Scaled 2D representation of gacH (2,475 bp, 824 aa), containing a transmembrane domain (aa 1–395, green) and a catalytic domain (aa 444–822, red). Premature stop codons were identified in 12 isolates and are indicated by the vertical blue arrows that show the amino acid position and frequency.
To study the functional consequences of the premature stop codons in gacH, we expressed three gacH variants (stop codon at nt position 928, 937, and 2,320) on a plasmid in a S. pyogenes mutant lacking gacH (5448ΔgacH). Additionally, we included a WT strain (20162146) that contained the naturally occurring premature stop codon in gacH at amino acid position 773. To determine the presence of GroP on the GAC GlcNAc side chain, we measured the phosphate content in the isolated GAC in the different S. pyogenes strains [8]. As expected, GAC isolated from 5448∆gacH contained a significantly reduced amount of phosphate compared to WT 5448 GAC (Fig. 5a). This phenotype could be restored by complementation with a plasmid containing WT gacH, but not with gacH variants encoding GacH variants truncated from amino acid positions 309, 312 or 773 (Fig. 5a). Similarly, the WT strain 20162146, which naturally acquired the premature stop codon at position 2,320 in gacH, showed strongly reduced levels of phosphate in the isolated GAC (Fig. 5a).
*Biochemical and functional analysis of gacH variants with premature stop codon in S. pyogenes. (a) Analysis of phosphate content in GAC isolated from WT 5448 S. pyogenes, 5448ΔgacH and 5448ΔgacH complemented with plasmid-expressed WT gacH, premature stop codon gacH (on positions 309, 312 or 773) or WT S. pyogenes that naturally acquired a premature stop codon in gacH (strain 20162146). The concentrations of phosphate are relative to S. pyogenes WT 5448. Bars and error bars represent the mean relative phosphate concentrations measured in three biological replicates and the SD, respectively. Survival of all strains mentioned in (a) after exposure to (b) 0.5 µg ml−1 recombinant hGIIA or (c) 1.25 mM Zn2+. Bars and error bars represent the mean percentage survival and sd, respectively (n=3 biological replicates). P values were calculated by one-way ANOVA comparing all strains to 5448ΔgacH; ***<I>P<0.001, ***<I>P<0.0001.
The presence of GroP confers susceptibility of S. pyogenes to the bactericidal enzyme hGIIa and resistance to zinc toxicity [8]. To confirm the functional implications of GroP loss in the complemented strains expressing gacH variants, we determined bacterial survival of these gacH variants in a hGIIa-killing assay and zinc susceptibility assay. Similar to the 5448ΔgacH mutant, all premature stop codon variants displayed resistance towards the bactericidal enzyme hGIIa and increased susceptibility to zinc compared to the isolate that plasmid-expressed WT gacH (Fig. 5b, c). Overall, both the biochemical and functional assays confirmed that gacH variants with an early stop codon in the gene and even a premature stop codon close to the C-terminus (resulting in a protein of 773 aa instead of 824) were defective in modifying GAC with GroP.
Discussion
GAC is a major and characteristic cell wall component of S. pyogenes and plays important roles in bacterial physiology and pathogenesis. Functional and structural analysis of the GAC has been performed for a few well-known S. pyogenes laboratory strains, where deletion of gacI results in reduced survival in human blood and animal models of systemic infection [1215]. Furthermore, removal of gacH renders S. pyogenes susceptible to zinc and resistant to the host cationic antimicrobial peptides including hGIIA [8]. Here, we identified that a mutation in gacI, resulting in a premature stop codon, likely underlies the historical A-variant phenotype that has lost the characteristic GAC GlcNAc side chain after frequent animal passage. Furthermore, by analysing 2021 S. pyogenes genomes, we identified a small number of clinical S. pyogenes isolates that contain a gacH allele with a premature stop codon. By expressing these allelic variants in a gacH-deficient strain, we demonstrated that these genetic variants are severely attenuated in their enzymatic activity, resulting in loss of the GroP side chain and acquiring resistance to hGIIA but susceptibility to zinc.
In this study, we analysed a geographically and clinically diverse collection of S. pyogenes genome sequences, comprising 150 different emm types and 484 MLST types [14], to obtain a more comprehensive overview of the variability of the gac genes across the S. pyogenes population. We uploaded the genome sequences to PubMLST, which comprises an important freely-accessible tool for research on the presence and genetic variation of specific genes at the population level [22]. Our results thereby expand observations from previous work, where variation in gac genes was analysed in 520 of these 2044 S. pyogenes strains [1534]. In these studies, the average number of SNPs was about twofold lower in the gac gene cluster (1 bp per 260 bp) compared to the core genome (1 bp per 133 bp) of the analysed strains, suggesting negative selection. Negative selection was also implied by the ratio of nonsynonymous to synonymous SNPs (dN/dS) of the gac operon, which was 0.24 [14]. In line with these studies, SNP analysis of the strain collection analysed here observed an average of 57 SNPs, equivalent to 1 polymorphism per 260 bp, per gac cluster per genome. Although we did not carry out additional dN/dS analyses, our data also strongly suggest that sequence variation of the gac gene cluster is subject to negative selection, confirming the biological significance of the encoded biosynthesis machinery and GAC itself.
Similar to our study, the study of Henningham et al. [15] also reported the existence of strains with premature stop codons in the gac genes, although these variants were not functionally confirmed. In the expanded data set, we identified 10 isolates with premature stop codons in genes gacA-gacG, which are critical for biosynthesis of the polyrhamnose backbone. In addition, 16 isolates were identified that contained a premature stop codon in genes that are critical for decoration of polyrhamnose with GlcNAc–GroP (gacH-gacL). Twelve (75%) of these 16 isolates contained a premature stop codon in gacH and had a unique MLST profile and, with one exception, a unique emm type.
We aimed to assess the functional consequences of some gac stop codons. Indeed, strain Manfredo contained a premature stop codon in gacA but still agglutinated in the diagnostic latex agglutination test, suggesting a functional GacA protein. However, for a mutation in gacI that introduced a premature stop codon, we could not confirm a defect in GAC GlcNAc expression. Upon re-examination of the gacI sequence by Sanger sequencing, we were not able to confirm the mutation. This is likely related to the presence of a poly-A stretch, which could easily result in a sequencing error. Therefore, we advise caution when interpreting the presence of premature stop codons from whole-genome sequence data without functional validation.
We further analysed the gacH premature stop codons for functional consequences to GAC biosynthesis. In five of the 12 isolates, the premature stop codon resulted in a GacH protein of 773 aa. Three of these strains were isolated from different pathologies and different continents, suggesting that the acquisition of this premature stop codon evolved independently. Despite preserving ~94% of the protein, the stop codon at position 2,320 resulted in the loss of GroP, suggesting that the last 50 aa of the C-terminus of GacH are crucial for its enzymatic activity.
In addition to gene sequence variation, environmental conditions may also affect gacI and gacH expression or enzymatic activity. Historically, it was reported that GlcNAc is present in a 1:2 ratio to the rhamnose backbone [63135]. Furthermore, ~25% of GlcNAc residues contain a GroP group [8]. Possibly, the enzymes implicated in the biosynthesis of the GlcNAc–GroP epitope exhibit different expression levels or activities under different environmental conditions. Whether and how the amount of GlcNAc and GroP present on GAC is regulated remains to be determined.
In conclusion, the gacA-L gene cluster is highly conserved in its presence and genetic sequence. We showed that there are a few exceptions in which gac genes are present but contain a premature stop codon. For three of these gacH stop codon variants, we confirmed that the GAC-GroP modification is lost, which resulted in increased resistance to hGIIA. The high conservation of gac genes and sequence in the S. pyogenes population highlights the essential nature of this molecule for streptococcal survival in the human host. Nevertheless, expression could vary due to transcriptional or post-transcriptional regulation. Understanding these regulatory mechanisms would provide insight into disease pathogenesis, given the importance of the GlcNAc–GroP of GAC for host immune interaction.
Supplementary material
10.1099/mgen.0.001443Uncited Supplementary Material 1.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Scott JR Pulliam WM Hollingshead SK Fischetti VA Relationship of M protein genes in group A streptococci Proc Natl Acad Sci U S A 1985821822182610.1073/pnas.82.6.18223885219 PMC 397365 · doi ↗ · pubmed ↗
- 2Sims Sanyahumbi A Colquhoun S Wyber R Carapetis JR Ferretti JJ et al Global disease burden of group a streptococcus Streptococcus Pyogenes: Basic Biology to Clinical Manifestations University of Oklahoma Health Sciences Center 201626866218 · pubmed ↗
- 3Carapetis JR Steer AC Mulholland EK Weber M The global burden of group A streptococcal diseases Lancet Infect Dis 2005568569410.1016/S 1473-3099(05)70267-X 16253886 · doi ↗ · pubmed ↗
- 4Walker MJ Barnett TC Mc Arthur JD Cole JN Gillen CM et al Disease manifestations and pathogenic mechanisms of group A Streptococcus Clin Microbiol Rev 20142726430110.1128/CMR.00101-1324696436 PMC 3993104 · doi ↗ · pubmed ↗
- 5Lancefield RC The antigenic complex of Streptococcus Haemolyticus: II. chemical and immunological properties of the protein J Exp Med 19284746948010.1084/jem.47.3.46919869424 PMC 2131386 · doi ↗ · pubmed ↗
- 6Mccarty M The lysis of group A hemolytic streptococci by extracellular enzymes of Streptomyces albus. II. Nature of the cellular substrate attacked by the lytic enzymes J Exp Med 19529656958010.1084/jem.96.6.56913022851 PMC 2136167 · doi ↗ · pubmed ↗
- 7Uwe C Kreis VV Pinto BM Application of two-dimensional NMR spectroscopy and molecular dynamics simulations to the conformational analysis of oligosaccharides corresponding to the cell-wall polysaccharide of Streptococcus group A Int J Biol Macromol 19951711713010.1016/0141-8130(95)92678-J 7577810 · doi ↗ · pubmed ↗
- 8Edgar RJ van Hensbergen VP Ruda A Turner AG Deng P et al Discovery of glycerol phosphate modification on streptococcal rhamnose polysaccharides Nat Chem Biol 20191546347110.1038/s 41589-019-0251-430936502 PMC 6470023 · doi ↗ · pubmed ↗
