Mapping the transcriptional regulatory network of a fungal pathogen by exploiting transcription factor perturbation
Dhoha Abid, Holly Brown, Chase Mateusiak, Tamara L. Doering, Michael R. Brent

TL;DR
This study maps the gene regulation network of a deadly fungus and finds that capsule formation is not controlled by a single factor but by multiple overlapping regulators.
Contribution
The paper introduces a network-based approach to identify functionally orthologous transcription factors beyond sequence similarity.
Findings
No single transcription factor primarily regulates capsule formation in Cryptococcus neoformans.
Transcription factors involved in capsule regulation also control many other genes and processes.
Functionally orthologous transcription factors in C. neoformans and S. cerevisiae often differ from those identified by sequence homology.
Abstract
Cryptococcus neoformans is a deadly fungal pathogen. Upon entering a mammalian host, it deploys a voluminous polysaccharide capsule that is necessary for it to survive host defenses and maintain an infection. Capsule expansion is regulated transcriptionally, as deletion of many transcription factors (TFs) alters the capsule. Thus, we set out to map the transcriptional regulatory network of C. neoformans—that is, to identify the TFs that directly regulate each gene in the genome. First, we carried out RNA-seq of 120 single-TF-deletion strains, together with wild-type controls. We then applied NetProphet3, a TF network mapping algorithm, to predict the direct functional targets of each TF. Unexpectedly, analysis of this network indicated that there are no TFs that primarily regulate genes involved in capsule formation. Rather, the TFs that play a role in deploying the capsule also…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Fig 1
Fig 2
Fig 3
Fig 4| TF | # of targets | # of regulators | Capsule-implicated |
|---|---|---|---|
| Bzp5 | 884 | 18 | |
| Nrg1 | 692 | 11 |
|
| Gat201 | 567 | 19 |
|
| Swi6 | 509 | 11 |
|
| 05375 | 475 | 13 | |
| Fzc51 | 402 | 14 | |
| Usv101 | 402 | 12 |
|
| 00514 | 383 | 11 | |
| Cir1 | 334 | 12 |
|
| Rim101 | 328 | 15 |
|
| Ccd6 | 305 | 11 | |
| Rds2 | 284 | 16 |
|
| Sp1 | 282 | 21 |
|
| Ert1 | 231 | 15 |
| TF | TF | Identity | ||
|---|---|---|---|---|
| Fkh2 | Fkh2 | 47% | Cell cycle and DNA replication | Regulation of DNA replication |
| Fzc12 | Mal33 | 40% | Regulation of cell cycle and DNA replication | Carbohydrate transport |
| Mln1 | Rtg3 | 30% | Carbohydrate transport | TCA cycle, amino acid and nitrogen metabolism |
| Rsc8 | Swi3 | 36% | Transcriptional regulation | Transport of ubiquitinated proteins |
| Swi6 | Swi6 | 25% | Regulation of cell cycle | Cell wall and meiosis |
| Yap1 | Yap1 | 48% | Transcriptional regulation | Response to oxidative stress, sterol biosynthesis, response to metal ion, sulfur metabolism |
| Zfc2 | Usv1 | 46% | Peptidyl-lysine modification to peptidyl-hypusine | Metabolism of glycogen and other carbohydrates |
| TF | TF | Identity (%) | E-value | Jaccard similarity | Empirical |
|---|---|---|---|---|---|
| Gat201 | Gat4 | 41 | 2.16E-05 | 0.11 | 0 |
| 01069 | Ppr1 | 50 | 6.36E-05 | 0.10 | 0.01 |
| Fzc45 | Pip2 | 34 | 0.007 | 0.09 | 0.01 |
| Stb4 | War1 | 32 | 0.026 | 0.09 | 0 |
| Cir1 | Gzf3 | 59 | 2.09E-18 | 0.08 | 0 |
| Fzc19 | Tbs1 | 30 | 0.83 | 0.08 | 0.03 |
| Sip401 | Dal81 | 32 | 0.76 | 0.08 | 0 |
| Fzc44 | Mal33 | 52 | 0.17 | 0.07 | 0.04 |
| 06097 | Pdr3 | 48 | 7.40E-02 | 0.07 | 0.02 |
| TF | TF | Identity | E-value | Jaccard similarity | Empirical | Our method | Kelliher et al. |
|---|---|---|---|---|---|---|---|
| Ert1 | OAF1 | 41% | 1.86E-06 | 0.07 | 0.00 | ✔ | |
| Ert1 | Ert1 | 49% | 2.82E-09 | 0.01 | 1.00 | ✔ | |
| Fzc33 | Rgt1 | 33% | 0.046 | 0.07 | 0.01 | ✔ | |
| Fzc33 | Sef1 | 43% | 7.69E-07 | 0 | 1.00 | ✔ | |
| Gat1 | Gat1 | 70% | 3.12E-22 | 0.09 | 0.00 | ✔ | ✔ |
| Mig1 | Mig1 | 68% | 8.48E-27 | 0.09 | 0.00 | ✔ | ✔ |
| Ppr1 | Mal13 | 47% | 0.061 | 0.07 | 0.05 | ✔ | |
| Ppr1 | Ppr1 | 25% | 1.28E-07 | 0.04 | 0.61 | ✔ | |
| Rim101 | ZAP1 | 31% | 5.93E-12 | 0.08 | 0.01 | ✔ | |
| Rim101 | Rim101 | 40% | 7.69E-07 | 0.04 | 1.00 | ✔ | |
| Skn7 | Skn7 | 51% | 5.18E-31 | 0.06 | 0.03 | ✔ | ✔ |
| Skn7 | Hms2 | 28% | 5.48E-08 | 0.03 | 1.00 | ✔ | |
| Usv101 | Msn2 | 59% | 1.41E-11 | 0.1 | 0.00 | ✔ | |
| Usv101 | Usv1 | 62% | 3.59E-20 | 0.07 | 0.09 | ✔ | |
| Usv101 | Rgm1 | 58% | 2.15E-21 | 0.05 | 0.99 | ✔ |
- —National Institute of General Medical Scienceshttp://dx.doi.org/10.13039/100000057
- —National Institute of Allergy and Infectious Diseaseshttp://dx.doi.org/10.13039/100000060
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFungal Infections and Studies · Fungal and yeast genetics research · Nail Diseases and Treatments
INTRODUCTION
Cryptococcus neoformans is a fungal pathogen that occurs in the environment and may infect mammals, including humans and mice. When infectious particles of C. neoformans are inhaled, they can initiate a lung infection. In immunocompetent individuals, this is generally cleared, although latent infection may occur. In severely immunocompromised individuals, however, the infection persists in the lungs and frequently disseminates to the brain, where it causes potentially lethal meningitis (1, 2). C. neoformans is estimated to be the second most common cause of AIDS-related mortality (3) and is responsible for 147,000 deaths annually (3, 4).
C. neoformans is a basidiomycete, belonging to a phylum that diverged approximately 400 million years ago from the ascomycete lineage, which includes the model yeast Saccharomyces cerevisiae and other fungal pathogens like Candida and Aspergillus (5). It is distinguished by a large polysaccharide capsule that surrounds the cell and is central to its virulence (6). Upon entry into a host or exposure to conditions that mimic the host environment, the capsule expands considerably (7, 8). Acapsular cells are avirulent (9, 10), while larger capsules increase resistance to engulfment by phagocytes during infection (11–13).
We know that capsule production is regulated at the level of transcription (14, 15), yet our understanding of this regulation is limited. One way to define regulatory processes is to generate a transcription factor (TF) network map (16–23). Such maps describe the relationships between each TF in the genome and the genes that it regulates. A map consists of nodes and edges: nodes represent TFs and genes, and an edge exists between a TF and a target gene if the TF regulates that gene. Edges are described as: TF-target.
Many gene regulatory networks are intended to represent functional regulation rather than molecular interactions. In other words, edges indicate relationships between TFs and genes that could be either direct or indirect. Here, we aim to generate a map that represents relationships in C. neoformans that are both direct and functional. In other words, the TF must physically bind to the regulatory DNA of the target gene (so it is direct), and as a result, it must modulate the transcription rate of that target (so it is functional). Importantly, this is not always the case: genes whose promoters are bound by a TF in ChIP-Seq or other binding location assays are frequently unaffected by deleting or overexpressing the TF, which shows that direct interaction does not always indicate a functional relationship (24, 25).
To generate our network, we used NetProphet3, a state-of-the-art method for mapping TF networks that we recently developed (26). NetProphet3 uses as input gene expression profiles of cells with or without perturbation of a TF by, for example, deletion or overexpression. Because they are based on expression data, NetProphet3 maps are functional in the sense that TFs are likely to affect the expression of their predicted target genes. Here, we show that the functional TF-target relationships predicted by NetProphet3 in Cryptococcus are highly enriched for direct interactions, consistent with our findings in Saccharomyces cerevisiae (26). Indeed, we show that NetProphet3 predicts the probability that a TF will bind a target gene in a ChIP-seq experiment with reasonable accuracy, using only gene expression data as input.
In this work, we generated a comprehensive data set of gene expression changes in response to TF perturbation in C. neoformans. Specifically, we carried out RNA-Seq on 120 TF deletion strains (selected as described below) and used the resulting data as input for NetProphet3 to create a TF network map. The map assigns scores to all possible TF-target interactions that represent our confidence in the existence of a direct, functional relationship. After validating this map, we used it to identify functional orthologs of Cryptococcus TFs in S. cerevisiae. We also searched the network for TFs that primarily regulate genes involved in capsule formation but found none, suggesting that the capsule is not deployed independently but rather as part of an arsenal of cellular defense and stress response systems. Our new TF perturbation response data set is available from GEO (GSE297962), the network map we constructed from it may be found at https://doi.org/10.5281/zenodo.17193243, and a visualization is available at https://cryptococcus.net.
RESULTS
Overview of the transcriptome
We generated gene expression data for all of the single-TF deletion strains that we could either generate (27) or obtain from the Madhani C. neoformans deletion collection (28). We grew these 120 strains, each lacking one TF (File S1), along with wild-type (WT) controls, in standard laboratory conditions for rich medium growth (YPD, room air, 30°C). We then transferred them to capsule-inducing conditions for 90 minutes and measured gene expression levels by RNA-Seq. The capsule-inducing conditions we used, which are designed to mimic the host environment, include mammalian tissue culture medium (Dulbecco’s Modified Eagle Medium), 37°C, and 5% CO_2_; these conditions are known to result in cells with large capsules (27). We chose a 90-minute time point to reflect early changes in transcription and because it has yielded insights into cryptococcal gene regulation in our previous studies (14, 27).
To enable rigorous analyses, we carried out RNA-Seq in a strictly controlled manner and with multiple biological replicates for each TF deletion strain (average = 4, Fig. 1A; File S1). We also included WT control cultures along with TF deletion strains in each experiment, totaling 122 WT samples. Under our conditions, we detected expression of 6,746 genes, which included 163 of the 165 TFs in C. neoformans. Figure 1B shows the distribution of expression levels for all TFs in WT cells, while Fig. S1A provides the values for individual TFs. The most highly expressed TF genes were RIM101 (29) and BZP1 (30). Most TFs were measurably expressed in capsule-inducing conditions (Fig. 1B), so we expected that deleting the corresponding genes could alter the expression of their targets.
(A) Histogram of the number of RNA-seq biological replicates performed for TF deletion strains. (B) Histogram of the expression levels of TFs (normalized read counts) in WT cells. Only the 120 TFs that were deleted are included. (C) Histogram of the number of DE genes in TF deletion strains. (D) The number of genes differentially expressed in response to TF deletions as a function of WT expression levels of the deleted TFs. The red cluster was enriched for TFs implicated in capsule formation.
Next, for each TF deletion strain, we calculated the number of differentially expressed (DE) genes, which may be functional targets of the TF (Fig. 1C and Fig. S1B). A gene was defined as DE if the adjusted P-value of expression in the deletion versus WT was below 0.01 and the absolute log_2_ fold-change (FC) relative to WT was greater than 0.4 (https://doi.org/10.5281/zenodo.17193388 for non-TF-encoding genes and File S3 for TF-encoding genes). Half of the TF-deletion strains had 1–20 DE genes, suggesting that they have relatively specific functions in our conditions, while the other half had more than 20. Overall, the expression level of a TF in the WT strain was not correlated with the number of genes that were DE in the TF deletion (Pearson correlation ρ = 0). However, when we plotted the number of genes that responded to a TF deletion against that TF’s WT expression level, TFs clustered into three groups (Fig. 1D). TFs in the blue cluster regulated few to no genes, despite the broad range of expression levels of those TFs in WT cells. In the green cluster, the expression level of each TF correlated with the number of genes that were DE when that TF was deleted (Pearson correlation ρ = 0.57, P = 0.01). TFs in the red cluster were expressed at a relatively high level and had an intermediate number of DE genes (54–381). This group was enriched for TFs for which the corresponding deletion strain has an abnormal capsule phenotype (Hypergeometric P = 0.0007), whereas the other clusters were not (hypergeometric P = 0.97 and P = 0.98; see Supplemental Methods for details).
A TF network map for Cryptococcus neoformans
We used NetProphet3 (26) to construct a TF network map for C. neoformans. The goal of NetProphet3 is to predict direct and functional TF-target relationships, in other words, those in which a TF binds the promoter of a target gene and thereby influences its expression. It achieves this by using a machine learning (ML) algorithm to learn patterns in gene expression data that are indicative of direct regulation. ML algorithms are first trained to make predictions on cases where the true answer is known. They are then applied to make predictions where the true answer may not be known. During training, NetProphet3 uses available TF-binding location data, such as from ChIP-Seq experiments, to indicate whether binding of the TF to a gene promoter is known to occur; this is assigned as a label, with a value of 1 (binding) or 0 (no binding) (Fig. 2A). In the prediction phase, this information is no longer available, so only gene expression levels are used to predict whether a TF binds in a gene’s promoter region. In both phases, the gene expression data used to make predictions is first processed in various ways, producing the actual inputs to NetProphet3 (called features). Specifically, the features include the fold-change in expression of the target when the TF is perturbed (DE feature) and the relationships between the expression of TFs and the expression of target genes across all gene expression samples (LASSO and BART features; see Materials and Methods).
(A) Diagram of the NetProphet3 data flow, as described in the text. (B) Evaluation of the cross-trained mode, where Cryptococcus-binding location data are not used in training, for agreement with binding data. Only TF-target edges where binding data are available for the TF were evaluated. Shown is the performance of NetProphet3 (red) versus random expectation (black dashed line) and two other established methods for inferring TF networks from gene expression data. Threshold, the number of highest-scoring edges included in the network, divided by the number of TFs. (C) ChIP-Seq validation of NetProphet predictions. Expected, NetProphet score (predicted probability); observed, actual validation of predictions by ChIP-seq data. (D) Evaluation of the NetProphet3 network after integrating binding data, using the GO metric; y-axis values plotted are the median across all TFs of the -log P value of the most significant GO term for that TF. The other algorithms indicated in panel (B) are shown for comparison. Shaded area, 95% confidence interval.
When the trained NetProphet3 model is applied to gene expression data, it produces a binding probability for each TF-gene pair (Fig. 2A, Predictions), which is interpreted as the probability that the gene is a direct, functional target of that TF. The predicted probability is high if NetProphet3 identifies patterns in the gene expression data that suggest the TF binds in the gene’s promoter. Since very little ChIP-seq binding location data are available for Cryptococcus, we trained NetProphet3 on S. cerevisiae binding and expression data (see Materials and Methods for details). We refer to this as NetProphet3’s cross-trained mode. We then used this trained model on our gene expression data from C. neoformans TF deletion strains (without binding location data). The model is trained to combine evidence from features to make reasonable probability estimates. Importantly, the features do not include the identity of specific TFs or target genes. We therefore hypothesized that it would perform as well on C. neoformans data as it did on S. cerevisiae data.
The C. neoformans network map consists of direct interactions
We next used two evaluation metrics to assess whether high-scoring TF-target edges in our map are direct and functional. One of these is based on binding data and one is based on gene function annotation. For the first one, we defined a binding evaluation metric as the fraction of predicted TF-target edges that are supported by physical binding data (26). We used this to evaluate the network, which had been trained using only S. cerevisiae data and then applied to our Cryptococcus gene expression data. We performed this evaluation for 10 C. neoformans TFs with associated physical binding data: Cir1 (31), Cuf1 (32), Gat201 (33), Gat204 (33), Hap-X (31), Nrg1 (14, 27), Liv3 (33), Pdr802 (34), Sre1 (PRJNA557210), and Usv101 (14, 27).
True TF-target edges represent a small fraction of the universe of possible TF-gene edges. For this reason, we evaluated only the top-scoring TF-gene edges. To do this, we first ranked all TF-gene edges where the TF was one of the 10 that had binding data, from highest to lowest scores. We then generated networks that included a total number of edges equal to various multiples of the number of TFs. The most stringently thresholded network retained only 100 edges, equal to 10 times the number of TFs with binding data (on average, 10 per TF, although different TFs have different numbers of targets). Finally, we plotted the fraction of edges with binding support for networks of various sizes (Fig. 2B).
Because we ranked edges by score, the networks with fewer targets per TF had higher average edge scores and were expected to have greater binding support. Indeed, for the smallest network, over 50% of predicted edges were supported by binding data, compared to a random expectation of 4% (Fig. 2B). Since ChIP-seq studies can miss true interactions, this is an excellent outcome and gives us high confidence in the remaining predictions.
As expected, increasing the network size led to a reduction in the fraction of edges supported by binding data (Fig. 2B). This shows that higher-scoring edges are more likely to be direct. Furthermore, the performance of the C. neoformans network with the binding metric is comparable to that of the S. cerevisiae network produced by NetProphet3 (26), indicating that the NetProphet3 method developed using S. cerevisiae is equally applicable to C. neoformans. Since no Cryptococcus binding data were used at any point in the construction of the network, these are effectively prospective experiments, providing a good estimate of accuracy.
For comparison, we also ran two popular network inference algorithms from other labs, Genie3 (16) and Inferelator3 (35), on our data (Fig. 2B). The results showed that NP3 substantially outperformed both of these in terms of ChIP-Seq support.
The TF-target edge scores output by NP3 are intended to be estimates of the probability that the target’s promoter would be bound by the TF in a ChIP-seq experiment. To test whether these scores were reasonable estimates, we selected sets of edges within various probability score ranges and calculated both the expected validation rate for each set (the average of its edge scores) and its actual validation rate, based on ChIP-Seq data (Fig. 2C). The vertical line shows the minimum score for inclusion in the network of 100 targets per TF. Above this score threshold, the observed rate was either very close to the expected rate or above it, indicating that the scores are reasonable estimates of the validation rate and, in some cases, are even too pessimistic.
Incorporating Cryptococcusbinding data into the network
Once the evaluation with Cryptococcus binding-data was complete, we produced a second network that incorporated the binding data into the network by using NP3’s integration mode (26) (see Supplemental Methods). The two networks differ only in the edges emanating from the 10 TFs for which we had ChIP-Seq data. The second network is used for the remainder of this paper and is also available at https://doi.org/10.5281/zenodo.17193243. This network map can be searched and visualized at https://cryptococcus.net/.
The C. neoformans network map is functionally coherent
The second evaluation metric we applied was based on gene ontology (GO) biological process annotations. This GO metric reflects the degree to which each TF’s targets are enriched for genes that share a common functional annotation and are therefore biologically coherent. For this evaluation, we downloaded C. neoformans annotations from the UniProt website, most of which were transferred from S. cerevisiae based on sequence similarity (see Materials and Methods for details). Overall, 799 GO biological processes were transferred to C. neoformans, annotating over 3,000 genes, and the target sets of 30 TFs in our network were significantly enriched for at least one GO biological process term (Table S1). For each TF and each biological process annotation, we then calculated a hypergeometric P-value (see Materials and Methods for details). A significant P-value indicates that more of the TF’s target genes are annotated with the function than would be expected by chance, and hence the TF’s targets are functionally coherent. To evaluate statistical significance, we created 30 random networks by permuting TFs and targets and evaluated them using the same metric. Although there is some fluctuation in absolute level, the performance of our network significantly exceeded that of the random networks at every threshold, indicating that each TF’s targets in our network were much more functionally coherent than would be expected by chance (Fig. 2D). This further supports the validity of the network. The Genie3 and Inferelator3 networks had similar functional coherence, although NetProphet3 outperformed them on networks of intermediate size.
The most influential TFs are enriched for those that are known to regulate the capsule
For this study, we generated or obtained deletion strains lacking 120 of the 165 TFs in the Cryptococcus genome. We were unable to delete the other 45. However, NetProphet3 still predicts targets for these TFs. It does this by using non-DE features obtained by analyzing all gene expression profiles together (Fig. 2A, LASSO and BART; see Materials and Methods). We therefore had predictions for all 163 TFs that were expressed in our data set. In all, 161 of the TFs had at least one target gene under the conditions assayed. The total number of regulated genes (4,103) corresponds to 60% of those that were measurably expressed in our studies and 49% of all C. neoformans genes. This network is the basis for our analyses below.
NetProphet3 outputs probability scores for all possible TF-gene edges, most of which have very low scores. When selecting a set of predicted edges for further analysis, there is a tradeoff between including only edges predicted with high confidence and including enough edges to characterize the organism’s network comprehensively. As a reasonable compromise, for further analysis, we chose a set that included, on average, 100 targets per TF. This set consists of all edges scoring above 0.14 (Fig. 3A, dashed line).
(A) NetProphet3 probability scores for all TF-gene edges. The vertical dashed orange line indicates the score threshold corresponding to 100 TF-target edges per TF on average. (B) The number of targets per TF. (C) The number of direct TF regulators per gene. (D) The top 10 high-scoring TF-target edges in the capsule subnetwork. Colors indicate the phenotype of strains deleted for the corresponding genes: orange for hypocapsular and cyan for hypercapsular. Arrows denote activation and T-heads denote repression. All target genes except for TDA10 and CNAG_04753 encode DNA-binding proteins. The Gat201-Gat204 edge has been previously detailed (36). (E) Genes were ranked by the number of capsule TFs that regulate them, and the percentage of genes where the deletion strain exhibits a capsule phenotype was plotted for various rank cutoffs. (F) Percent of TF-TF pairs ranked above the indicated threshold that share a target and show phenotype consistency. Dashed line, percent of all TF-TF pairs that show phenotype consistency.
The more targets a TF has, the more influence it can potentially exert on the transcriptional state of the cell. In our network, the number of direct targets of each TF ranged from 7 to 884 (Fig. 3B). When we ranked TFs by the number of targets, the top 16 TFs (10%; File S5) covered 6,709 (41%) of the 16,300 TF-target edges in our network. Surprisingly, these 16 TFs regulate 81% of the 4,103 genes in the network, showing their disproportionate impact. We were interested to notice that 10 of these TFs are known to regulate the capsule, based on the phenotypes of corresponding deletion strains (File S2). This is significantly enriched over the fraction of all TFs that are known to regulate the capsule (63% versus 31%; hypergeometric P = 0.006; see Discussion).
In addition to targets per TF, we examined the distribution of TFs per target gene (Fig. 3C). The distribution peaked at one regulator per target gene and dropped off exponentially, with one outlier gene having 55 regulators (CNAG_04126, not plotted in Fig. 3C). When we ranked genes by this metric, we found that the top 10% of targets (410) participated in 33% (5,315) of the TF-target edges. On average, these top 10% targets have 13 regulators (File S5). Also, these 410 targets were generally enriched for capsule-implicated genes (those whose deletion perturbs capsule; hypergeometric P < 3 × 10^−6^; File S5).
Many TFs are both powerful regulators and highly regulated, thus occupying central positions in the network. We defined hub TFs as those that are in the top 10% both by the number of TFs that regulate them and by the number of genes they regulate. Our network contained 14 hub TFs (Table 1), of which eight are capsule implicated (hypergeometric P = 0.007; see Discussion).
The C. neoformans network map is biologically coherent
We took advantage of the abundant available data on capsule phenotypes to assess whether the C. neoformans network makes biological sense. For this, we first created a capsule subnetwork composed of all TF-target edges for which both the TF and target are known to influence capsule phenotype. This capsule subnetwork includes 113 target genes, of which 49 encode TFs, linked by 318 TF-target edges.
Focusing on the capsule subnetwork, we investigated whether the predicted direction of regulation (activation or repression) at each edge was consistent with the capsule phenotypes of the TF and its target gene. We considered an edge consistent when (1) the TF activates the target, and both demonstrate the same capsule phenotype upon deletion (hypercapsular or hypocapsular) or (2) the TF represses the target, and they have opposite capsule phenotypes. In Fig. 3D, we show the 10 highest-scoring TF-target edges of the subnetwork; notably, many of the targets themselves encode TFs. All the edges shown were consistent with the capsule phenotype (binomial P = 0.02) except for two: Gat201-PDR802 and Gat201-TDA10. In the case of TDA10, the negative regulation by Cir1 may be stronger than the positive regulation by Gat201.
We speculated that target genes regulated by the highest number of capsule-implicated TFs were most likely to have a capsule phenotype upon deletion. To test this, we ranked genes by the number of capsule TFs that regulate them from highest to lowest and calculated, at different thresholds, the percentage of genes that have a capsule phenotype (Fig. 3E). We found that genes regulated by the highest number of capsule TFs were much more likely to be capsule genes than those that were regulated by a lower number of capsule TFs: 9 out of the top 100 genes in this ranking were capsule-implicated, while only 2% of all genes are capsule-implicated (hypergeometric P = 0.02).
TF-TF pairs that share many targets
We previously showed that TFs that share large fractions of their targets are more likely to work together as a physical protein complex than those that do not (26). TF-TF pairs that share many targets but do not form a physical complex may cooperate by other mechanisms, such as being co-regulated by a third factor or binding the same DNA sequences. To gain insight into the TF-TF relationships of capsule TFs, we calculated the Jaccard similarity between target sets of all capsule-implicated TF pairs in the network (File S6). The Jaccard similarity is the number of targets in common divided by the total number of targets regulated by the two TFs. It is equal to one if all targets are shared by both TFs, and zero if none are shared. For each TF-TF pair, we determined whether this similarity was significantly higher than would be expected by chance (hypergeometric adjusted P < 0.05), indicating that the TFs are likely to cooperate (as defined above). For this analysis, we used only TF-target edges that had high NetProphet3 probability scores (greater than or equal to 0.2).
We observed that the TF-TF pairs with significant target overlap tended to regulate their targets in the same direction if the two TFs had the same capsule phenotype and in the opposite direction if the TFs had opposite capsule phenotypes. We considered the TF-TF pair to display phenotype consistency if both TFs had the same capsule phenotype and at least 80% of their shared targets were regulated in the same direction, or if they had opposite phenotypes and at least 80% of their shared targets were regulated in the opposite direction. We ranked these TF-TF pairs by the significance of their Jaccard similarity and calculated the fraction of pairs above different rank thresholds that display phenotype consistency (Fig. 3F). This fraction was much higher than would be expected at random and tended to decline as the significance of their target-set overlap declined, consistent with the notion that pairs with the highest target overlap are most likely to work in a coordinated fashion.
Relationships between cryptococcal TFs implicated in capsule have been reported in several contexts, including some efforts to map regulatory interactions (27, 36–39). However, these have generally been limited to relatively small groups of genes and focus on the direct regulation of one TF by another. Here, we focus on the overlap of targets among all capsule TFs. Figure 4 shows all pairs of capsule TFs that share many more targets in common than would be expected by chance (P <10^−5^), along with the targets that they share. Notably, half of these TFs are hub TFs, highlighting their central role in transcriptional regulation. While most of the regulatory relationships are activating, several TFs (Usv101 and Cir1) are uniformly repressive in this context.
A TF network map of shared targets of the 12 TF-TF pairs with the most significant hypergeometric P-values for their shared targets. These pairs are Gat201-Gat204, Usv101-Gat201, Swi6-Fkh2, Usv101-Cir1, Gat201-Rim101, Sp1-Cir1, Ecm2201-Gat201, Clr1-Clr3, Gat204-Ada2, Ecm2201-Gat1, Usv101-Ecm2201, Cir1-Fzc33. Orange and blue shading indicates that cells lacking the TF are hypocapsular or hypercapsular, respectively. TFs that are upregulated and downregulated in capsule-inducing conditions are indicated by upward and downward arrows, respectively. Targets whose regulators display phenotype consistency (defined in main text) are labeled in red.
Comparison of the C. neoformans and S. cerevisiae regulatory networks
Much of what we know about the functions and interactions of proteins in C. neoformans comes from studies of their orthologs in the model yeast S. cerevisiae. However, reconstruction of phylogenetic gene trees by sequence similarity comes with substantial uncertainty, and therefore identification of truly orthologous genes between species does, too. This is especially problematic for genes belonging to families that have substantial internal sequence similarity within each species, as is the case for many TFs. Even when two TFs are true historical orthologs, their functions may have diverged during evolution, so non-orthologous TFs may have more functional similarity. To identify TF pairs that were both historical orthologs and retained shared function, we combined sequence similarity with functional similarity, defined as regulating putatively orthologous target genes. We denote these pairs as (TFCn, TFSc).
Currently, the best available resource on homology between S. cerevisiae and C. neoformans proteins was compiled by Kelliher et al., who published a list of 4,572 homologous protein pairs based primarily on amino acid sequence similarity (40). In this list, a few C. neoformans genes were mapped to more than one S. cerevisiae gene because S. cerevisiae genes include paralogs derived from a whole-genome duplication event. We were curious as to whether the 63 pairs of TFs in this list, assigned by sequence homology, also showed functional similarity based on overlap of their target genes. For this purpose, we returned to the full NetProphet3 network for S. cerevisiae ([26]; https://zenodo.org/records/17196637) and C. neoformans.
First, we calculated a Jaccard similarity for each (TFCn, TFSc) pair based on their homologous target genes. As above, a higher Jaccard similarity for a (TFCn, TFSc) pair means that a higher fraction of the TFs’ targets encode homologous proteins. To determine whether the Jaccard similarity was significantly greater than what would be expected by chance, we calculated a permutation-based empirical P-value (see Materials and Methods). Surprisingly, only three (TFCn, TFSc) pairs from Kelliher et al.’s sequence-based list had significantly similar target gene sets: (Mig1Cn, Mig1Sc), (Skn7Cn, Skn7Sc), and (Gat1Cn, Gat1Sc). None of the other pairs on this list had significant target-set overlap, despite solid sequence similarity (File S7(A)). Next, we examined the seven homologous pairs from the Kelliher list for which both the C. neoformans and S. cerevisiae TF targets were enriched for a GO biological process (Table 2). Only one pair, (Fkh2Cn, Fkh2Sc), had overlap of their most significant GO terms, suggesting that the remaining pairs primarily regulate distinct biological processes.
In the previous analyses, we began with TF sequence similarity and looked for evidence of functional similarity. Next, we began with TF pairs that had similar target sets and asked whether the TFs had similar protein sequences. For this, we identified (TFCn, TFSc) pairs based solely on overlap of their targets and selected those with the highest Jaccard similarity index (>0.1). Surprisingly, BLASTP did not identify sequence similarity between most of these TFs, with only one pair (Gat201Cn, Gat4Sc) showing a significant alignment (E-value <1.0; File S7(B)). In general, pairs that shared homologous targets, and hence are more likely to have similar functions, did not have similar protein sequences.
We next looked for (TFCn, TFSc) pairs that had a balance of both sequence and functional similarity, using less stringent cutoffs. First, we ran BLASTP to align all possible (TFCn, TFSc) pairs. In all, 161 Cryptococcus TFs were in at least one pair with amino acid identity greater than 30%, 162 were in at least one pair with BLASTP E-value less than or equal to 1, and 24 were in a pair that had a significant Jaccard similarity index (corrected empirical P < 0.05; see Materials and Methods). Seventeen Cryptococcus TFs were in a pair that met all these criteria. For each of these, we calculated a score that combined their sequence and functional similarity (SF score, see Materials and Methods). For each Cryptococcus TF, we selected the (TFCn, TFSc) pair with the highest SF score. If more than one TFCn mapped to the same TFSc, the TFCn with the higher SF score was paired with that TFSc, while the TFCn with the lower SF score was mapped to its TFSc with the second-highest score.
The 17 TF pairs that we identified across these widely diverged species are derived from the same common ancestor (based on sequence similarity) and retain shared functions (based on target sets). Interestingly, the cryptococcal TFs in nine of these pairs (Table 3) do not appear in the sequence-based list of Kelliher et al. The remaining eight were included in that list, but our analysis suggests that they pair best with different partner TFs in S. cerevisiae with high sequence and functional similarity (Table 4). For only two TFs, Gat1 and Mig1, do both approaches agree (see Discussion).
Of the 165 Cryptococcus TFs, fewer than half (72) have been mapped to S. cerevisiae TFs based on either our method or sequence similarity alone. The remaining 93 have apparently diverged and may serve functions that do not exist in S. cerevisiae. Interestingly, 29 of these 93 highly diverged TFs have target sets enriched for genes that do not have homologs in S. cerevisiae (FDR = 0.05; File S8), thus highlighting the unique biology of C. neoformans compared to model yeast.
MATERIALS AND METHODS
Cell growth
To maximize reproducibility of RNA-seq, we followed strictly controlled protocols for cell recovery from frozen stocks, initial culture in YPD, inoculation into preconditioned media, and growth (see Supplementary Methods and reference [41]).
RNA-Seq
RNA isolation, library preparation, and sequencing were performed by standard methods as detailed in the Supplementary Methods and reference (41).
NetProphet3
We processed counts of measured gene expression levels into normalized log_2_ fold changes using DESeq2 (42). Samples for each TF deletion strain were compared to all 122 WT samples taken together. We then ran NetProphet3; see reference (26) for details of how NetProphet3 works. Our data set included perturbation profiles for 120 TFs. For those TFs, the differential expression (DE) feature for each gene was the log_2_ fold change of the gene’s expression in the TF deletion mutant, relative to its expression in the wild-type strain. For Gat204 and Liv3, we used TF perturbation data from Homer et al. (33). For the remaining TFs, which did not have DE data, we assigned all potential target genes a log fold change of 0 (https://doi.org/10.5281/zenodo.17193620).
LASSO (43, 44) and BART (45, 46) are regression algorithms that were used to derive additional features for each potential TF-target edge by training models to predict the expression of each gene from the expression levels of all TFs. The LASSO feature for a TF-gene edge is the linear regression coefficient learned for that TF in the model for that gene. The BART feature for a TF-gene edge is the predicted change in expression of the gene when the TF’s expression changes from its highest to lowest value found in the data (see reference 26 for details).
As described in the text, we first trained NetProphet3 on binding and expression data from S. cerevisiae (NetProphet3 cross-trained mode, described in reference 26). This training is not TF-specific; NetProphet3 simply learns patterns of gene expression features that are characteristic edges supported by binding location data. For the ten TFs that have binding location data in Cryptococcus, we trained a separate, TF-specific model that integrates binding and expression data (NetProphet3 integration mode, described in reference 26). Finally, we combined the networks produced by the cross-trained mode and the integration mode into one final network (Fig. 2A).
Evaluation metrics
Binding metric
Binding data was available for only 10 TFs (see Results and Table S2), which contained 3,308 TF-target edges. Since these data sets and the information provided with them were so heterogeneous, we used different P-value thresholds for each data set (Table S2).
GO metric
We used the GO-Term-Finder package (47) to calculate whether the targets of each TF were enriched for any GO biological process terms (26). GO annotations were downloaded from the UniProt website (File S4). We then calculated a Hypergeometric P-value for enrichment of each TF’s targets with each GO biological process term. For TFs with at least one significant GO term, we calculated the median, across all significant GO terms, of their minus log P-value.
Empirical P-value for Jaccard similarity between Cryptococcus and Saccharomyces TFs
We constructed 1,000 randomized Cryptococcus networks by shuffling NP3 scores of TF-target edges. For each randomized network, we calculated the Jaccard similarity index for each TF-TF pair. For each TF-TF pair in the original network, the fraction of TF-TF pairs in randomized networks with a greater or equal Jaccard similarity is the empirical P-value.
Sequence-functional similarity score calculation
For Cryptococcus TFs that aligned to an S. cerevisiae TF with at least 30% identity and an E-value of 1.0 or less, we compared their target sets and calculated an empirical P-value for the null hypothesis that the overlap is no larger than expected by chance (see previous paragraph). We then combined the alignment E-value with the empirical P-value for target overlap to obtain a sequence-functional similarity (SF) score via the following equation:
The SF is large when the alignment E-value is small (hence its negative logarithm is large) and the empirical P-value is small. The +1 in the denominator ensures that the fraction is still defined when the empirical P-value is zero.
DISCUSSION
A TF network map represents regulatory relationships by linking each TF to its direct, functional targets—the genes it regulates by binding to their regulatory DNA. This mechanistic approach stands in contrast to co-expression networks, which link pairs of genes that have correlated expression patterns. Co-expression networks have been constructed for Cryptococcus and used to identify new genes with a particular deletion phenotype, such as altered capsule thickness (48). TF network maps have also been used to identify novel genes with a particular phenotype by using methods such as PhenoProphet (27). In addition, TF network maps can be used for inferring changes in TF activity (49) and for determining which TFs regulate many of the same target genes, and hence may form a physical complex (26).
For S. cerevisiae, a wealth of comprehensive genomic resources has contributed to our understanding of gene regulation. There are multiple binding location data sets for most yeast TFs, obtained by using ChIP-chip (50), ChIP-exo (51), ChEC-seq (52, 53), and Transposon Calling Cards (54). These are complemented by multiple comprehensive data sets, in which gene expression was profiled before and after each TF was perturbed by gene deletion (55, 56), gene over-expression (57), or rapid protein degradation (25). We previously used some of these data sets and NetProphet3 to build a comprehensive TF network map for S. cerevisiae (26). We wished to do the same for C. neoformans, but no comprehensive binding or TF perturbation data sets existed. To fill this gap, we created a comprehensive TF perturbation response data set by carrying out RNA-Seq on strains in which the gene encoding a single TF had been deleted, a resource that we expect will be useful to the Cryptococcus research community. This data set is completely independent of our large RNA-Seq data set on responses to host-like conditions (41).
With this data set in hand, we used NetProphet3 to build a TF network map for Cryptococcus. To determine its reliability, we used two primary metrics. The first tested whether high-scoring edges (those NetProphet3 is most confident of) tend to be supported by ChIP-seq data where such data are available (Fig. 2B and C). By this metric, NP3 outperformed two other popular network inference algorithms that we applied to our gene expression data. The second metric tested whether TFs’ target sets are enriched for genes with shared biological functions at a rate higher than would be expected by chance (Fig. 2D). Both assessments indicated that our network is reliable.
Our previous Cryptococcus TF network map (27) was built using gene expression data on only 41 TF-deletion strains. The current network uses data from 79 additional TF deletion strains, so it has more high-scoring targets of the new TFs. By our measures, the new network is significantly better (Fig. S2). We also built a website for the new network where users can visualize subnetworks comprising TFs and genes of interest (https://cryptococcus.net). We anticipate this will be of use to researchers in the field as they develop and test mechanistic hypotheses about cryptococcal TFs.
Capsule-implicated TFs played a central role in our network, comprising 8 of the 14 hub TFs (Table 1), significantly more than expected by chance. Surprisingly, no TF had capsule-implicated genes, as more than 36% of its targets. Even for major capsule regulators such as Gat201, Cir1, and Nrg1, only 4%–5% of their targets were capsule-implicated genes. This suggests the unexpected idea that there may not be any capsule-specific TFs. Instead, capsule genes appear to be regulated by TFs that also regulate many other processes. Large polysaccharide capsules are phylogenetically restricted among fungi, presumably evolved to manage species-specific challenges. Thus, one might expect that they could be deployed specifically in response to such stresses, independent of phylogenetically ubiquitous stress responses. However, our network suggests that this is not the case. Instead, it may be better to think of the capsule as one of a large set of tools that are deployed together in response to certain stressful conditions. Nonetheless, genes that are coordinately regulated by multiple capsule-implicated TFs (Fig. 4) may be involved in capsule development or other key stress-response functions and hence of interest for follow-up studies.
Cryptococcus genes are often named after their putative orthologs in S. cerevisiae and implicitly assumed to have a similar function. Putative orthologs are identified by protein sequence similarity (40). However, TFs typically consist of a small DNA-binding domain (DBD) and large intrinsically disordered regions that contain no identifiable functional domains. These disordered regions tend to have little sequence conservation across long evolutionary distances. Therefore, TF alignments between distantly related species are often possible only within the DBD, which comprises a small fraction of the TF protein sequence. Furthermore, it is increasingly understood that, while DBDs may define a core DNA-binding motif, binding locations are significantly impacted by the disordered domains (58, 59). These observations suggest that inferring functional homology between TFs from sequence alone may be difficult and error-prone. To complement sequence-based approaches, we attempted to determine functional homology by comparing C. neoformans and S. cerevisiae TFs according to how many putatively orthologous target genes they share. By combining sequence homology and functional homology, we identified 17 pairs consisting of a Cryptococcus TF and an S. cerevisiae TF that we consider true functional orthologs. Cryptococcus TFs that are not paired to S. cerevisiae orthologs may have evolved species-specific functions and therefore regulate many target genes that do not have S. cerevisiae orthologs. Alternatively, they may have retained their function while their target genes diverged from S. cerevisiae to the point where their orthologs cannot be identified by sequence comparison. Interestingly, sequence homology alone had suggested no orthologs for nine of the cryptococcal TFs in our 17 pairs in a previous study (Table 3), while for the other eight, our method suggested different orthologs from those suggested by sequence similarity alone. Overall, the sequence-based method tended to agree with our method only when protein identity was very high (~70%). This suggests that TF network maps may, in general, be important for identifying functional orthologs between highly diverged species.
We were able to construct a high-quality TF network map using a large new data set of gene expression profiles from strains lacking TFs (perturbation response data). However, more data resources would make the network map even better. Although genome sequence analysis identified 165 putative TFss in the genome of Cryptococcus neoformans, we were able to obtain or construct deletion mutants for only 120 of them. Some of the remaining 45 have been added to the Madhani deletion collection since this work was carried out. The remainder may be essential (60). For those that are essential, newly implemented methods in the field, such as inducible protein degradation (61), may be useful. Another complementary resource that is needed to fully illuminate the regulatory network is a comprehensive TF binding location data set, potentially generated using ChIP-exo, ChEC-seq, or Transposon Calling Cards (62, 63). Future integration of such data into our TF network using NetProphet3 will yield an enhanced TF network map that will further advance research into this fascinating and important pathogen.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Meya DB, Williamson PR. 2024. Cryptococcal disease in diverse hosts. N Engl J Med 390:1597–1610. doi:10.1056/NEJ Mra 231105738692293 · doi ↗ · pubmed ↗
- 2Tugume L, Ssebambulidde K, Kasibante J, Ellis J, Wake RM, Gakuru J, Lawrence DS, Abassi M, Rajasingham R, Meya DB, Boulware DR. 2023. Cryptococcal meningitis. Nat Rev Dis Primers 9:62. doi:10.1038/s 41572-023-00472-z 37945681 · doi ↗ · pubmed ↗
- 3Rajasingham R, Smith RM, Park BJ, Jarvis JN, Govender NP, Chiller TM, Denning DW, Loyse A, Boulware DR. 2017. Global burden of disease of HIV-associated cryptococcal meningitis: an updated analysis. Lancet Infect Dis 17:873–881. doi:10.1016/S 1473-3099(17)30243-828483415 PMC 5818156 · doi ↗ · pubmed ↗
- 4Denning DW. 2024. Global incidence and mortality of severe fungal disease. Lancet Infect Dis 24:e 428–e 438. doi:10.1016/S 1473-3099(23)00692-838224705 · doi ↗ · pubmed ↗
- 5Taylor JW, Berbee ML. 2006. Dating divergences in the fungal tree of life: review and new analyses. Mycologia 98:838–849. doi:10.3852/mycologia.98.6.83817486961 · doi ↗ · pubmed ↗
- 6Boodwa-Ko D, Doering TL. 2024. A Quick re CAP: discovering Cryptococcus neoformans capsule mutants. J Fungi (Basel) 10:114. doi:10.3390/jof 1002011438392786 PMC 10889740 · doi ↗ · pubmed ↗
- 7Agustinho DP, Miller LC, Li LX, Doering TL. 2018. Peeling the onion: the outer layers of Cryptococcus neoformans. Mem Inst Oswaldo Cruz 113:e 180040. doi:10.1590/0074-0276018004029742198 PMC 5951675 · doi ↗ · pubmed ↗
- 8Wang ZA, Li LX, Doering TL. 2018. Unraveling synthesis of the cryptococcal cell wall and capsule. Glycobiology 28:719–730. doi:10.1093/glycob/cwy 03029648596 PMC 6142866 · doi ↗ · pubmed ↗
