ToxAssay: a hierarchical model-driven tool for advanced toxicogenomics biomarker discovery
Md Masud Rana, Md Nurul Haque Mollah, Mohammed H Albujja, Sibte Syed Hadi, Fan Liu

TL;DR
ToxAssay is a new tool that improves the discovery of toxicity biomarkers by modeling complex genetic relationships in drug toxicity data.
Contribution
ToxAssay introduces a hierarchical linear model for more accurate and efficient biomarker identification in toxicogenomics.
Findings
ToxAssay identified 71 key genes and 26 core genes with high accuracy in glutathione depletion-induced toxicity.
The tool showed strong cross-correlation (Pearson’s r = 0.88) with external datasets.
An AOP analysis algorithm linked glutathione depletion to disease outcomes.
Abstract
Understanding the genetic basis of drug-induced toxicity is crucial for drug development. In-silico analysis of toxicogenomics datasets facilitates early detection of toxicity biomarkers. However, existing tools struggle with the complex interdependencies among hierarchically structured variables, leading to inaccurate biomarker identification. To address this limitation, we developed a Hierarchical Linear Model (HLM) and implemented it in the R package ToxAssay, offering extensive functionality for comprehensive toxicity assessment. ToxAssay outperforms existing methods by improving biomarker detection and computational efficiency. Applied to glutathione depletion-induced toxicity, it prioritized 71 key genes and identified 26 core genes with high discriminative accuracy (AUC = 0.97) and strong cross-correlation (Pearson’s r = 0.88) with external datasets. Additionally, our advance…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7- —Naif Arab University for Security Sciences10.13039/100019920
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Gene expression and cancer classification · Computational Drug Discovery Methods
1 Introduction
Drug-induced organ toxicity damages organs in the body due to the adverse effects of a drug or chemical substance, leading to a high failure rate of new drug candidates in clinical trials. Approximately 30% of drug candidates fail due to unwanted or unmanageable toxicity between 2010 and 2017 (Dowden and Munro 2019; Sun et al. 2022). Failing to identify toxicity during the testing stage can risk public health with delayed adverse effects. Early detection of toxicity biomarkers can reduce costs, rejection rates, and the need for human and animal testing, enhancing patient safety (Khan et al. 2014). In-silico analysis of toxicogenomics data has proven to be an effective and powerful approach for the early detection of toxicity biomarkers (Pognan et al. 2023).
Several toxicogenomics databases have been established, among which Open TG-GATEs (Igarashi et al. 2015), DrugMatrix (Ganter et al. 2006) and Comparative Toxicogenomics Database (CTD) (Davis et al. 2023) are the three main public databases directly linking toxicity with gene expression data (Alexander-Dann et al. 2018). The Open TG-GATEs, designed for toxicogenomics, emphasizes toxic doses and includes compounds known for toxic effects. DrugMatrix, on the other hand, covers a wider range of chemicals, focusing on effective doses. The CTD stands out for its comprehensive curation of scientific literature, linking environmental factors to human health with a vast repository of chemicals, genes, and diseases. These databases have enabled extensive investigations, such as generating biomarker signatures and characterizing mechanisms of toxicological endpoints in the liver, including glutathione depletion, phospholipidosis, cholestasis, fibrosis, cirrhosis, and carcinogenesis. Pharmaceutical research nowadays heavily relies on these findings in understanding the drug toxicity (Uehara et al. 2010; Schomaker et al. 2019).
Commonly used open-source software tools for the analysis of large-scale toxicogenomics datasets include ToxicoDB (Nair et al. 2020) and Toxygates (Nyström-Persson et al. 2013, 2017). ToxicoDB provides synchronized compound annotations, dynamic compound plots based on time and dosage, and pathway analysis related to toxicity. Toxygates offers an integrated platform for ranking compounds using pattern-based approaches reliant on relevant candidate gene expression values. The core methods used for identifying differentially expressed genes (DEGs) in ToxicoDB, and Toxygates are limma and Welch’s t-test, respectively. Moreover, a majority of independent studies in toxicogenomics used Welch’s t-test (or ANOVA) and limma for identifying DEGs (Kiyosawa et al. 2004; Kiyosawa et al. 2007; Ellinger-Ziegelbauer et al. 2008; Uehara et al. 2008; Gao et al. 2010; Huang and Tung 2017).
Although these existing tools have demonstrated their applicability from different perspectives, there is significant room for improvement. Firstly, different methods often produce significantly divergent P-values on same datasets due to a lack of consideration for the hierarchical structure of experimental design (Kulkarni et al. 2022). For example, ToxicoDB is designed to analyze single-compound data, accounting for dose and time factors in the design matrix to identify DEGs, but it does not consider the interdependencies between time and dose across a group of similar compounds. In contrast, Toxygates follows a standard procedure to identify DEGs by comparing two sets of compounds (e.g. toxicity-positive and toxicity-negative), but it simplifies the approach by pooling samples into groups, overlooking the specific interdependencies of dose and time factors. Secondly, there has been limited methods for AOP analysis that connect molecular-level biological perturbations to adverse outcomes. Lastly, user-friendly tools for automated post-marker discovery analysis are lacking, making these tasks challenging for wet-lab researchers. These limitations underscore the necessity of developing a sophisticated tool capable of analyzing various toxicities, selecting relevant biomarkers with reduced redundancy, and addressing a broad spectrum of inquiries with optimized statistical power.
In this study, we present a novel statistic for identifying DEGs using HLM that takes into account the hierarchical data structure inherent in toxicogenomics data. We demonstrate the statistical power of our model through extensive computer simulations. We also developed an association rule mining algorithm for AOP analysis that maps compound perturbation data to disease data from CTD. We implemented our newly developed statistical models and algorithms in a user-friendly R package ToxAssay, offering extensive functionality for post-marker discovery analysis. To demonstrate its utility, we applied ToxAssay to analyze glutathione depletion-induced toxicity, a key endpoint, using the Open TG-GATEs dataset.
2 Results
2.1 Overview of ToxAssay
The workflow of ToxAssay is depicted in Fig. 1 and the key functions of ToxAssay are described in Supplementary Method A. In brief, ToxAssay is an R-based software package designed for the comprehensive evaluation of drug-induced toxicity utilizing complex toxicogenomics databases, including perturbation data from sources such as Open TG-GATEs and DrugMatrix, along with manually curated relational data from the CTD. The package offers a comprehensive suite of functions, encompassing: (i) identification of molecular markers for targeted toxicity, including DEGs, advance outcome pathways (AOPs), functional pathways, and Protein–Protein Interaction (PPI) networks; (ii) development of optimized machine learning classifiers for predicting the targeted toxicity in test samples of compounds.
Overview of ToxAssay. (a) workflow of ToxAssay in five steps. Step 1: users specify a list of compounds (e.g. toxicity-positive and toxicity-negative) for comparison to study targeted toxicity. Step 2: extracts raw data from existing databases. Step 3: The raw data is annotated with compound and gene identifiers. Step 4: formats and prepares the data for analysis. Step 5: ToxAssay provides multiple analysis functions, generating tables and graphical outputs for DEGs, gene expression patterns, AOP associations, functional pathways, PPI networks, and toxicant prediction. Note that Steps 2–4 can be skipped if the input data (Y, X, V1, and V2) is already available. (b) Algorithm flowchart highlighting main functions in ToxAssay. A brief description of the flowchart is provided in Supplementary Note. Key functions tgx_genes, core_degs, and get_aop are detailed in the gray dotted boxes at the bottom.
An important innovative component of ToxAssay is that it implements a newly developed HLM that model gene expressions, considering factors such as compounds, dosages, and time points, in a nested data structure (compound → dose → time point) (Supplementary Fig. S1). Details regarding the development of HLM are provided in Methods. In brief, let represent the gene expression value measured in the m-th replication at the l-th time point following exposure to the k-th dose of the j-th compound. For each compound-dose-time combinations, the HLM is written as , where denotes the residual error term and represents the intercept for time level. The intercept for the dose level is , where is the intercept at the compound level.
Let be a set of compound groups (typically , representing targeted toxicity-positive and toxicity-negative) to be compared for assessing targeted toxicity, where each contains compounds with similar toxicity characteristics. Then, the intercept of compounds in the i-th group is , where is the overall intercept (ie, mean) across all compounds and is the mean effect of the i-th compound group. The null hypothesis of is tested using a F-statistic with and degrees of freedom, adjusted for the extent of dependency in the data, which can be captured by the overall intracluster correlation (ICC), , where is the total of the intercept variances. Here, and represent the total number of possible clusters and samples in the data, respectively.
Let be the initial set of DEGs identified by the proposed F-statistic. To refine the initial set and remove genes influenced by chemical-specific outlying expressions, ToxAssay generates gene sets, using a cross-validation framework. Finally, the reduced DEGs set is obtained as: (see Methods for details).
ToxAssay further utilizes DEGs for AOPs analysis using a proposed algorithm developed by association rule mining. The DEGs is subsequently used for functional pathway and PPI analysis to elucidate underlying molecular mechanisms and identify most interconnected and biologically relevant core DEGs (CDEGs) (see Methods for details). Finally, the identified gene signature is utilized to develop machine learning classifiers, aimed at predicting the targeted toxicity of the toxicant to be tested.
2.2 Power of ToxAssay via computer simulations
Computer simulations were carried out to assess the performance of ToxAssay in terms of statistical power (scenario 1) and type-I error ( ) (scenario 2), in comparison with two other competing software packages, ToxicoDB (Nair et al. 2020) and Toxygates (Nyström-Persson et al. 2013; 2017). The simulations were conducted under both balanced (equal number of compounds in the targeted toxicity-positive and toxicity-negative groups) and unbalanced (unequal number of compounds) designs.
Scenario 1 compares the statistical power between ToxAssay, ToxicoDB, and Toxygates in detecting DEGs across various combinations of effect sizes ( ) and compound numbers ( ) under varying levels of overall ICC ( ). In a balanced design with no data dependency ( ), all packages exhibit equivalent power across all combinations of and (Fig. 2a–c). However, as increases, ToxAssay consistently outperforms the other packages, demonstrating superior power in every combination of and . Specifically, at low dependency ( ), power increases by approximately 5% (Fig. 2d–f). At moderate dependency ( ), power improves by roughly 10% (Fig. 2g–i), and at high dependency (ρ = 0.8), power surges by approximately 20% when is 15 and is 0.50 (Fig. 2j–l). A similar trend is observed under unbalanced designs with the same parameter settings (Supplementary Fig. S2).
Power comparison between ToxAssay, ToxicoDB and Toxygates via simulations under balanced design. Gene expression data were simulated for 10 000 genes, with 10% exhibiting a true effect, according to the parameter grid: δ∈{0, 0.1, 0.2,…,1}, ρ∈{0, 0.1, 0.5, 0.8} and r∈{4, 6, 10, 14, 20, 30}. The power of ToxAssay, ToxicoDB, and Toxygates was evaluated under four conditions: ρ=0 (a–c), ρ=0.1 (d–f), ρ=0.5 (g–i), and ρ=0.8 (j–l). Each scenario was assessed using 100 replicates at α=0.05. The statistical power of each testing framework is illustrated using LOESS smoothing. Contour lines indicate combinations of r and δ that yield equivalent statistical power, quantitatively expressed. A color gradient highlights regions of varying statistical power, with areas of low power shown in purple and areas of high power in yellow. The x-axis represents the number of compounds (r), and the y-axis represents the effect size (δ), while ρ denotes the level of data dependency. Notably, ToxAssay demonstrated increased power as ρ increased.
Scenario 2 explores the type-I error rate ( ) across different values under both balanced and unbalanced design (Supplementary Fig. S3). Simulation results indicate that consistently remains at 5% or slightly above, under varying . For perfectly non-DEGs, where there is no variation in the intercepts, consistently stabilizes at 5%. Even in the presence of small variation in the intercepts, allowing for minor biological noise, there is a slight elevation in , regardless of changes in . This underscores the high stability of in the ToxAssay.
2.3 ToxAssay R package and computational efficiency
ToxAssay is a user-friendly, open-source R package available at https://github.com/Fun-Gene/toxassay. Toxicogenomics data analysis can become computationally intensive, especially with large datasets and repetitive tasks across multiple endpoints. We assessed the computational efficiency of ToxAssay with ToxicoDB and Toxygates, using simulated data (1, 5, 10, 20, and 50 thousand genes; 6, 10, 20, 30, 40, and 50 compounds). ToxAssay consistently outperformed all benchmarks in terms of peak memory usage and computation time in all dataset configurations (Supplementary Fig. S4).
2.4 Identification of DEGs associated with glutathione depletion
To empirically showcase ToxAssay’s power and proficiency in evaluating toxicological endpoints, we applied it to analyze the Open TG-GATEs dataset for glutathione depletion-induced toxicity, comparing our results with four previous studies. These previous studies proposed different and partially overlapping gene sets as biomarkers for early detection of glutathione depletion, ie 53 (53 probes, set-A) (Kiyosawa et al. 2004), 145 (161 probes, set-B) (Kiyosawa et al. 2007), 64 (78 probes, set-C) (Gao et al. 2010), and 385 (460 probes, set-D) (Yamauchi et al. 2011) (Supplementary Table S1). Of note, there was a remarkable scarcity of overlapping genes between any two previously proposed gene sets (ranging from 2.1% between set-A and set-B to 35.9% between set-C and set-D).
Out of the 15 027 unique genes in the gene pool derived from > 31 000 GeneChip^®^ RAE 230A gene probes, ToxAssay identified 71 significant DEGs ( after Bonferroni correction, Fig. 3a, Supplementary Table S2). The overall ICC values of these genes in TG-GATEs were indeed very high (mean = 0.83, sd = 0.06, Supplementary Fig. S5), confirming a high degree of data dependency and consistent with our simulation analysis. Among these 71 genes, 60.6% (43/71) overlapped with previously reported gene sets, with increased pair-wise overlaps with any previously proposed gene set (12.7% with set-A, 22.5% with set-B, 14.1% with set-C, and 46.5% with set-D, Fig. 3b). We confirm the overlapping finding of Txnrd1 in set-B and set-D as the most significantly DEG in our study. The average fold change expression ( ) of Txnrd1 across different levels between glutathione-depletion positive compounds (GDPCs) and glutathione-depletion negative compounds (GDNCs) groups clearly demonstrates significant difference, as well as cluster-related variations and patterns in the data (Fig. 3c–e).
DEG identification in glutathione depletion. (a) Volcano Plot of DEGs. The x-axis represents the average log2(FC) in gene expression between GDPCs and GDNCs across dose and time, while the y-axis shows the −log10 P-values. Red dots indicate novel DEGs, blue dots indicate previously reported DEGs, light blue dots represent chemical-specific (biased) DEGs, and grey dots denote non-DEGs. (b) Venn diagram and upset plot of biomarker genes. The overlap and uniqueness of gene sets identified by ToxAssay and four prior studies. The ellipses represent intersecting gene sets for ToxAssay (purple), Set-A (red), Set-B (olive), Set-C (green), and Set-D (blue). The bar plot displays the count of unique upregulated and downregulated genes. (c) The GDPCs regulated approximately 95% of the overall expression of top DEG Txnrd1, in contrast to the minimal expression regulated by the GDNCs. Txnrd1 exhibited higher average log2(FC) in the GDPCs group across compounds, doses, and time points compared to GDNCs. (d) The figure illustrates clustering in gene expression levels using Txnrd1 as an example. Compared to the traditional t-test (gray dashed line), HLM provides a significantly better fit across doses (top) and time points (bottom). (e) Heatmap of Txnrd1 expression clearly distinguishes GDPCs and GDNCs groups at different doses (top) and time points (bottom).
There was strong evidence supporting our newly identified genes (28/71) being functionally involved in glutathione depletion. For example, Mgst2 and G6pd have been long used as the index genes for ranking GDPCs but failed to be identified by all previous studies (Nyström-Persson et al. 2013). We also observed significant differential expression and cluster-related variations in these genes across different levels (Supplementary Fig. S6). In addition, Txn1 and Raf1 have been reported as important contributors to the process of glutathionylation, and others including Cyp1a2, Maff, Usp10, Zfp703, Cp, Eaf1, and Eif4a1 are known to play significant roles in oxidative stress (Poulsen et al. 1998; Lu 2013; Aoiadni et al. 2021; Turner and Turner 2021; Sango et al. 2022). These lines of functional evidence empirically support the improved statistical power of our method in detecting DEGs.
2.5 ToxAssay DEGs unveil AOPs in glutathione depletion
We further explored AOPs related to glutathione depletion, focusing on liver disease outcomes categorized by Medical Subject Headings (MeSH). Through association rule mining analysis, we identified 25 liver diseases linked to 32 DEGs, resulting in 392 significant gene-disease associations (Fig. 4). Consistent with existing knowledge, several AOPs associated with glutathione depletion were highlighted, including non-alcoholic and alcoholic fatty liver disease (MeSH: D005234, D005235, and D065626), cholestasis and cholestatic injury (D002779, D001651, and D002780), hepatitis (D006505, D006519, D006520, and D019693), liver cirrhosis (D008103, D008104, and D008105), and chemical/drug-induced liver injury (D056486 and D056487) (Yuan and Kaplowitz 2009; Lu 2020; Vairetti et al. 2021). A more comprehensive list of AOPs induced by glutathione depletion and their associated genes is provided in Supplementary Table S3.
Association between clusters of DEGs and AOPs. (a) DEG clusters. (b) Associations between DEGs and disease MeSH IDs, with the strength of association weighted based on lift values. (c) Disease clusters.
Cluster analysis of gene-disease associations identified five distinct gene clusters (G1 to G5), each showing similar association patterns within the cluster but differing between clusters, suggesting shared molecular mechanisms within each group (Fig. 4a and c). The six genes in G1 (Ephx1, Atf4, Cp, Gsr, Raf1, and Mdm2) are mainly involved in cellular stress responses and apoptosis regulation. The five genes in G2 (Nol8, Maff, Creg1, Slc43a1, and Sc5d) contribute to cellular homeostasis and metabolic regulation. The nine genes in G3 (Btbd2, Gsta3, Epb41, Wdr43, Cyp1a2, Mgst2, Paics, Slc17a3, and Zwint) function in detoxification and cellular metabolism. The nine genes in G4 (Txnrd1, Nqo1, Hmox1, Gclm, Gclc, Akr7a3, Abcc4, Osgin1, and Slc3a2) play key roles in protecting against oxidative stress, likely regulated by nuclear factor erythroid 2–related factor 2 (Nrf2). Finally, the three genes in G6 (Cbr1, Gnl3, and Hspb8) are associated with cellular protection and stress response. While these annotations align with existing literature, further statistical validation is required to confirm their significance in this specific dataset.
Similarly, cluster analysis revealed three distinct disease clusters (D1 to D3). Diseases in D1 were predominantly explained by genes from G1, G2, and G4, indicating a shared molecular basis between these disease and gene clusters. D2 was influenced by a wide spread of the tested genes, making it difficult to attribute specific functional annotations to this cluster. In contrast, D3 exhibited a more specific gene-disease relationship, with genes from G4 being exclusively associated with diseases in D3, including fatty liver (D005234 and D065626), cholestasis (D002779 and D001651), liver cirrhosis (D008103 and D008105), hepatitis (D006505), hepatocellular carcinoma (D006528), liver failure (D017093 and D017114), and drug-induced liver injury (D006528). These findings overall reveal distinct gene and disease clusters associated with glutathione depletion, offering insight into potential molecular mechanisms.
2.6 Core differentially expressed genes (CDEGs) in glutathione depletion
We explored the functional roles of 71 significant DEGs using databases KEGG (Kanehisa et al. 2017), gene ontology (GO) (Harris et al. 2004), Reactome (Croft et al. 2011) and Wikipathway (Slenter et al. 2018) (Supplementary Table S4). Notably, we identified significant enrichment in several key metabolism-related pathways, including metabolism of xenobiotics by cytochrome P450 ( ), glutathione metabolism ( ) and metabolic pathways ( ), as well as in various cancer-related pathways such as ferroptosis ( ), chemical carcinogenesis ( ) and hepatocellular carcinoma ( ) in the KEGG database (Fig. 5a). Cluster analysis based on the semantic similarity of GO biological process terms highlighted clusters related to cellular redox homeostasis, response to toxic substances, response to thyroxine, as well as glutathione metabolism and small molecule metabolic processes (Fig. 5b). Our findings suggest that our primary set of 71 DEGs is significantly more relevant to glutathione depletion than previously identified gene sets (Fig. 5c), a finding further supported by pathway analysis using Reactome and Wikipathways databases (Supplementary Fig. S7).
Enriched pathways in glutathione depletion. (a) KEGG pathway enrichment. A total of 26 DEGs were enriched in the top 10 KEGG pathways (Q < 0.01). The size of the nodes representing pathway terms corresponds to the −log10 P-values of enrichment. (b) GO biological process clusters. From 46 enriched GO biological process terms, six parent clusters were identified using a high similarity threshold of 0.9. The bars within each parent pathway term represent the enriched terms, with the intensity of blue indicating the similarity within clusters. The x-axis displays the GO IDs, and the y-axis shows the −log10 P-values of the enriched terms. (c) Comparative Enrichment: KEGG and GO terms enriched by ToxAssay were more relevant to glutathione depletion than those identified in previous studies. The thickness of the lines reflects the −log10 P-values of the pathway terms.
A PPI analysis of the 71 DEGs using STRING (Szklarczyk et al. 2021) resulted in a dense PPI network consisting of 70 nodes and 355 edges. Except one, all the 71 DEGs are protein-coding, and among which, 62 genes formed an interconnected network within the PPI framework (Fig. 6a, Supplementary Table S5). By applying a degree centrality threshold of 20 within the dense PPI network, we delineated a core network of 15 hub genes (Fig. 6b). Additionally, a parallel analysis using an eigenvalue centrality threshold of 0.7 confirmed a similar set of hub genes, along with Ugt2b7 (Fig. 6c).
PPI subnetworks and hub genes functionally involved in glutathione depletion. (a) The PPI network of DEGs revealed three distinct subnetworks, with Subnetwork 1 (SN1) being the densest, containing all hub genes identified by the degree-based centrality measure (b) or the eigenvector-based centrality measure (c). (d) SN1 showed the most widespread enrichments in pathways identified in KEGG. (e) Expression patterns of genes in SN1 clearly differentiate samples from GDPCs and GDNCs. The inner circle heatmap is generated from average log2(FC) data at 24-hour high dose, while the outer circle heatmap is based on average log2(FC) data across all time points at high dose levels.
In order to minimize the variability in results from differing algorithmic choices, we employed three widely used algorithms including edge-betweenness centrality (EBC) (Girvan and Newman 2002), Walktrap (Clauset et al. 2004), and Fastgreedy (Pons and Latapy 2005) to identify co-regulated subnetworks within the PPI network. Genes in the subnetworks identified by the EBC algorithm were found to play a dominant role in identified KEGG pathways and displayed distinctly clear clustering patterns in samples (Fig. 6d and e). Similarly, genes in the key subnetworks identified by the Walktrap and Fastgreedy algorithms demonstrated a significant influence on identified KEGG pathways and exhibited distinct clustering patterns (Supplementary Fig. S8). Genes in the key subnetworks identified by all three algorithms, which play a dominant role in KEGG pathways, are largely common and also exhibit a similar dominant effect in other pathway databases (Supplementary Fig. S9).
Using a soft threshold of in our core DEGs identification method (see Methods), we identified 26 CDEGs (Txnrd1, Akr7a3, Srxn1, Ugdh, Hmox1, Gclc, Gsr, Gclm, Ftl1, Zfand2a, Cyp1a2, Hspb8, Ephx1, Txn1, Slc3a2, Mgst2, Gsta3, Atf4, Slc7a11, Slc17a3, Gstm1, Abcc4, Gstm4, G6pd, Nqo1, and Cp), encompassing all hub genes identified in the PPI network and those strongly associated with significant AOPs (Supplementary Fig. S10). This highly interconnected and biologically significant gene signature underscores their critical roles in glutathione depletion.
2.7 Core gene set accurately predicts glutathione-depleting hepatotoxicants
Our analysis further probed the predictive capability of the 26-core gene signature on samples subjected to glutathione-depleting agents through both unsupervised hierarchical clustering and supervised classification methods. The unsupervised hierarchical clustering delineated two distinct expression profiles within different data subsets (Supplementary Fig. S11), accurately differentiate samples from GDPCs and GDNCs groups (Fig. 7a and b). This performance of our 26 CDEGs markedly exceeded previously suggested gene sets under the same analyses (Supplementary Fig. S12).
Accuracy in predicting compound groups (GDPCs and GDNCs) using all 71 DEGs or 26 CDEGs. (a) Hierarchical clustering of the 26 CDEGs based on average log2(FC) clearly distinguishes samples from high-dose GDPCs and GDNCs groups at the 24-hour time point. (b) Clustering across all time points similarly separates GDPCs and GDNCs. (c) Prediction accuracy of GDPCs and GDNCs using a logistic regression model shows that both the 71 DEGs (pink) and 26 core DEGs (red) perform similarly (AUC = 0.97), outperforming previously proposed gene sets. (d) Cross-correlation analysis of DEG expression between the Open TG-GATEs dataset (used in this study) and the DrugMatrix database (external dataset). DEGs show high cross-correlation, while a randomly drawn set of 71 background genes (gray density) and 26 background genes (red density) from the genome exhibit low correlation. The solid lines on the density plot represent average cross-correlation for the random 26 core DEGs (red) and 71 background genes (gray), with dashed lines representing 26 core genes (red) and 71 DEGs (gray).
In supervised classification, employing a logistic regression model across 100 iterations within a 10-fold cross-validation framework, we attained notable predictive accuracy (AUC = 0.97), equating to the accuracy obtained using the complete set of 71 DEGs (AUC = 0.97). This significantly surpassed the outcomes of prior studies, which reported an average AUC of 0.86 with a maximum value achieved at 0.95 (Fig. 7c, Supplementary Table S6).
2.8 Core gene set shows high cross-correlation across databases
To assess the consistency and robustness of ToxAssay gene signature across independent toxicogenomic platforms, we performed a cross-correlation analysis between the Open TG-GATEs and DrugMatrix datasets, specifically focusing on Methimazole, the only compound common to both datasets with identical experimental setups. Methimazole is a known glutathione-depleting hepatotoxicant, making it an ideal compound for evaluating the generalizability of the ToxAssay gene signature in predicting glutathione depletion-induced hepatotoxicity.
The analysis revealed a strong correlation in gene expression perturbations between the two datasets, with Pearson correlation coefficients of 0.90 for the 71 DEGs and 0.88 for the 26 CDEGs (Fig. 7d). These high correlation values demonstrate the robustness and consistency of the ToxAssay gene signature across distinct experimental platforms, supporting its ability to predict glutathione depletion-related toxicity. For further validation, we compared the observed correlations with those obtained from number-matched random gene signatures. After 10 000 replicates, the random gene signatures yielded much lower correlation values of 0.45 for the 71 gene set and 0.43 for the 26 gene set (P < 0.0001). This clear difference between the observed and random correlations highlights the reliability and specificity of the ToxAssay gene signature, confirming its robustness in identifying biomarkers and predicting drug-induced toxicity across diverse experimental contexts.
3 Discussion
We present ToxAssay, an R-based software for comprehensive drug-induced toxicity evaluation, utilizing major toxicogenomics databases such as Open TG-GATEs (Igarashi et al. 2015), DrugMatrix (Ganter et al. 2006), and CTD (Davis et al. 2023). While the current implementation focuses on gene expression data from Open TG-GATEs and DrugMatrix, ToxAssay can easily adapt to other datasets like CMap or L1000 (Subramanian et al. 2017) and other omics data types, provided the input is properly formatted. Although several databases catalog the toxicity profiles of compounds, they lack molecular-level insights into toxicity mechanisms (Chen et al. 2016; Lea et al. 2017; Wu et al. 2023). ToxAssay addresses this gap by providing functionalities for in-depth molecular analysis and advanced machine learning models to evaluate a wide range of toxicity endpoints.
ToxAssay’s novel HLM enhances the detection of DEGs, outperforming existing tools like ToxicoDB (Nair et al. 2020) and Toxygates (Nyström-Persson et al. 2013, 2017) in both power and efficiency. Our HLM accounts for the inherent hierarchical structure in the data, such as compound, dose, and time points (compound → dose → time point), which is often overlooked by existing methods. The model is particularly effective when the overall ICC is greater than zero (ie, ), indicating clustering at any level of the data (compound, dose, or time points). This is evident from both computer simulations and empirical application to glutathione depletion analysis, where our identified gene sets showed greater commonality, functional relevance, and higher discriminative power compared to previously reported sets.
A novel element along with the HLM in ToxAssay is our association rule mining algorithm for AOP analysis. This algorithm improves the mapping of compound perturbation data to disease datasets from curated databases, facilitating the identification of association patterns between clusters of diseases and DEGs. Unlike existing tools such as ToxicoDB (Nair et al. 2020) and Toxygates (Nyström-Persson et al. 2013, 2017), which lack this functionality, our approach is crucial for linking biological processes to adverse outcomes. Applying this algorithm, we found that the DEG set in cluster G4 is uniquely associated with diseases in cluster D3, revealing a strong connection between oxidative stress genes and liver diseases, a finding consistent with previous mice gene-editing studies that demonstrated the pivotal role of oxidative stress regulation in the progression of liver pathology (Rius et al. 2003; Krejsa et al. 2010; Ahmed et al. 2011; Brennan et al. 2017; Lee et al. 2019; Canesin et al. 2022; Yan et al. 2023).
Unlike traditional AOP approaches that emphasize curated pathway analyses, network modeling, or database integration (Fay et al. 2017; Spinu– et al. 2019; Saarimäki et al. 2023; Wiklund et al. 2023), our method uniquely incorporates association rule mining to uncover mechanistic links between gene-expression changes and disease states. To our knowledge, this integration has not been benchmarked against existing AOP methods, as no comparable frameworks explicitly apply association rule mining. Thus, our algorithm may offer a novel and complementary perspective within the AOP computational toolkit.
Another novel element along with ToxAssay is our core gene identification method, which integrates pathway enrichment and PPI analysis. Unlike previous algorithms such as EnrichNet (Glaab et al. 2012) and NetPEA (Liu et al. 2017), our method focuses on identifying densely interconnected, biologically functional regions within the network rather than solely discovering new pathways. Instead of relying on a single community detection method like EBC (Girvan and Newman 2002), Walktrap (Clauset et al. 2004), or Fastgreedy (Pons and Latapy 2005), we employ a stacking algorithm that enhances core gene detection by minimizing noise and reducing false-positive interactions, which is crucial for pinpointing key genes in large and complex networks. As a result, we identified a core subnetwork comprising 26 genes with the highest interconnectivity and functional relevance. The predictive performance of this combined gene set was validated by achieving high AUC values and strong cross-correlations with external databases. Following a comprehensive review of our 26 CDEGs, informed by prior research, we propose a molecular mechanism underlying glutathione depletion (see Supplementary Discussion and Supplementary Fig. S13). These 26 CDEGs form an intricate network supporting glutathione’s essential roles in detoxification, antioxidant defense, and overall cellular health, offering insights into therapeutic strategies for drug-induced toxicity.
Efforts continue to enhance predictive models by identifying genomic markers that improve the precision of determining causes and mechanisms of deaths due to sudden incidents, overdoses, or drug administration, thus enriching the intersection of medical and forensic genetics (Wendt and Budowle 2020; Di Nunno et al. 2021). ToxAssay has the potential in post-mortem toxicological analyses, aiding in the determination of the cause and manner of drug-related deaths.
A significant limitation of our study is the selection of an adequate number of training compounds to study glutathione depletion due to the lack of available knowledge. It is recommended to use more compounds in the ToxAssay, if possible, to enhance efficiency in toxicity assessment. Another notable limitation is the lack of experimental validation for the results obtained using ToxAssay. Empirical validation is essential to confirm the efficacy of our methodology. Additionally, our research relies heavily on in vivo rat data, which poses a limitation as the results may not fully translate to human subjects. This underscores the need for additional research to evaluate the applicability of our findings to different species. Addressing these limitations is crucial to strengthen the robustness and relevance of our research.
4 Methods
4.1 Toxicogenomics data
Several toxicogenomics databases have emerged, with Open TG-GATEs (Igarashi et al. 2015), DrugMatrix (Ganter et al. 2006), and CTD (Davis et al. 2023) being the leading public resources that establish direct links between toxicity and gene expression data (Alexander-Dann et al. 2018).
In Open TG-GATEs, liver and kidney samples were collected in vivo, where gene expressions of triplicate samples were recorded at four distinct time points (3, 6, 9, and 24 hours) within each of the three dose levels (low, middle and high) after administering a single dose of the studied compound. Repeated doses were also available wherein gene expressions were measured at 4, 8, 15, and 29 days under the same dose levels after the initial administration. In addition, in vitro gene expressions in duplicate rats and human primary hepatocytes are available at three time points (2, 6, and 24 hours) and three dose levels (low, middle, and high).
The DrugMatrix dataset encompass various tissues, including liver, kidney, heart, and bone marrow. In vivo gene expressions of triplicate rat samples under repeat doses were recorded at 0.25, 1, 3, and 5 days (some replaced with 7/14/30/90 days) at two dose levels (low and high*). In vitro* gene expressions of duplicate rat primary hepatocytes were recorded at 16 and 24 hours after treatment with a high dose.
Additionally, ToxAssay utilizes the CTD to identify AOPs. The CTD is a curated collection of various links, including chemical-gene interactions, and chemical-disease and gene-disease associations. As of its most recent 2023 update, CTD encompasses 17 100 chemicals, 54 300 genes, 6100 phenotypes, 7270 diseases, and 202 000 exposure statements.
4.2 Hierarchical linear model (HLM)
Typical perturbation data of compounds in toxicogenomics exhibit a hierarchical, time-and-dose-dependent data structure (compound → dose → time point), revealing complex relationships among treatment, dosage, and time point (Supplementary Fig. S1). In a multilevel framework, gene expression values of the m-th sample at the l-th time point following the administration of the k-th dose of the j-th compound are modeled as follows:
the time point specific intercept ( ) is given by:
the dose specific intercept ( ) is given by:
where is the intercept at the compound level.
Let be a set of compound groups (typically , representing targeted toxicity-positive and toxicity-negative) to be compared for assessing targeted toxicity, where each contains compounds with similar toxicity characteristics. Then, the compound specific intercept ( ) is given by:
where is the general intercept and is the mean effect of the i-th compound group. It is assumed that the residuals are normally distributed, with a mean of zero and a variance of , ie, . Just as the residuals , the intercept residuals are , and . It is assumed that the residuals across different levels are uncorrelated.
In multilevel modelling framework, ICC is a crucial statistical metric, quantifying the level of similarity within clustered data. The overall ICC is determined by , where is the total of intercept variances. When clustering is absent ( ), all observations within compound-dose-time combinations are independent, that is, there is no variation between clusters and thus no dependency. Conversely, in the extreme case of , observations within clusters are identical while those across clusters differ completely, representing total dependency.
4.3 F-statistics of HLM for identifying DEGs
In a fixed-effect HLM, the hypotheses test the effect of all compound groups on gene expression values. The null hypothesis for the F-statistic assumes that the effects of all compound groups are absent ( ) adjusted for the extent of dependency in the data. The alternative hypothesis is that at least one . For each gene, the F-statistic can be derived as follows:
where is the total number of possible clusters in the data and is total number of samples in the i-th group and total sample size using the summation rule is (for derivation see Supplementary Method B). Under the null, has the central F-distribution with and degrees of freedom.
Let and , Equation (5) can be expressed as , where . and can be calculated using the quadratic form as and (for derivation see Supplementary Method C), where and . The quadratic matrices and are not necessarily symmetric. is an identity matrix and is a square matrix of one of order .
Let be the gene expression data matrix of dimension ; is total number of samples and is total number of genes to be tested. To derive the vector of all F-statistics for all genes, instead of calculating F-values based on single gene expression values ( ) in loops, we show that matrix computation of could substantially reduce the computational burden. This is achieved as , where and be the vector of quadratic term for genes. and can be calculated using as and , respectively.
Let be the set of all compounds, where each contains compounds, then can be written as . Let the initial set of significant DEGs identified by the proposed -test , be denoted as , which may also include some genes influenced by chemical-specific outlying expressions. To remove these outlying genes from set, we considered a leave ( )-out cross-validation framework for , as discussed below. We generate DEGs sets, denoted as , where each is computed using the proposed test statistic with the reduced set: , . Finally, the outlier-free reduced DEGs set is obtained as: .
4.4 Adverse outcome pathway (AOP) analysis
We developed an association rule mining algorithm to uncover significant correlations between gene targets in targeted toxicity and disease outcomes, with compounds acting as the linking mediators. Each rule within this framework comprises a trigger (gene) and an outcome (disease), suggesting genes playing a role in adverse outcomes. Here, each compound, along with its associated gene and disease interactions, is treated as a transaction.
Let be set of items which includes both genes (G) and diseases (D), such that and the set of the transactions for each compound is , where each . We generate a transaction matrix , where or is determined by the following:
After generating the transaction matrix using CTD database, similar to the previous study (Oki and Edwards 2016), we apply a quality control step to improve efficiency and accuracy by excluding genes and diseases with activity scores ≤ 1, ie, remove columns where .
In ToxAssay, we focused exclusively on pairwise associations between genes and diseases, considering only rules that involve one gene and one disease. The rule of the form satisfying and . To assess the strength of these associations, we used metrics like lift, support, confidence, and odds ratio (OR) (detailed in Supplementary Method D). We set a minimum support threshold of 0.3 transactions for rule generation. The choice of a 0.3 minimum support threshold for rule generation was selected to ensure that an association rule is identified only if the specific combination of gene expression changes and associated disease outcomes consistently occurs across a meaningful proportion (at least 30%) of the compounds analyzed. This threshold prioritizes robust and statistically significant biological relationships while minimizing the likelihood of identifying random or biologically irrelevant patterns. Such a criterion aligns with common data-mining practices in biological research that emphasize interpretability and reproducibility of results (Chen et al. 2015; Giulia et al. 2021; Qian and Sun 2023). Additionally, we selected association rules with a value of , indicating that the item is over-represented in the sample.
Furthermore, to assess disease-disease similarity, we employed the overlap score between diseases D1 and D2 (Goh et al. 2007):
where represents the set of genes associated with each disease, and denotes the cardinality of the set. Similarly, gene-gene similarity between genes G1 and G2 was calculated using:
where represents the set of diseases associated with each gene. The pairwise similarity is used to compute a distance matrix as . Genes or diseases are then hierarchically clustered, and clusters are obtained by cutting the tree at the desired threshold.
4.5 Functional enrichment and PPI network analysis
To reveal and compare the potential functional roles of the ToxAssay findings, we investigate the enrichment of identified gene signature in four pathway databases, including KEGG (Kanehisa et al. 2017), GO (Harris et al. 2004), Reactome (Croft et al. 2011) and Wikipathway (Slenter et al. 2018) using a false discovery rate (FDR) cutoff of 0.05. Additionally, the resulting GO terms were clustered into groups based on their semantic similarity, using Wang’s method to reduce redundancy in the GO lists (Wang et al. 2007).
To explore gene interactions, a PPI network was constructed using the STRING database with the latest available version 12.0, which contains more than 14 000 organisms and over 67 million proteins (Szklarczyk et al. 2021). The gene set was mapped to STRING to generate experimentally validated interactions among genes/proteins, with a score threshold of 200. Important modules and hub genes in the PPI network can be achieved by adjusting the centrality threshold, including eigenvector and degree. Three widely-used community detection algorithms for networks, namely edge betweenness centrality (EBC) (Girvan and Newman 2002), Walktrap (Clauset et al. 2004), and Fastgreedy (Pons and Latapy 2005), are applied to identify co-regulated subnetworks within the full network. The subnetworks containing less than 5% of the total proteins in the PPI network were combined into a unified miscellaneous subnetwork (SN0).
4.6 Identification of core DEGs (CDEGs)
We isolated key subnetworks from the PPI network to identify the most interconnected and biologically relevant genes from the DEG set, focusing on those with heightened biological and functional significance. Let be the set of genes (coding proteins) in the v-th subnetworks of the PPI network, obtained using the u-th community detection algorithm. The core genes are identified using the following formula:
where represents pathway enrichment in the database, and denotes the cardinality of the set. In this study, we use the GO biological process, but users can select from various databases, including GO, KEGG, Reactome, or WikiPathways. The threshold ( ) represents the minimum percentage of the enrichment score that the subnetworks must capture.
4.7 Prediction model using gene signature
We evaluated the predictive capability of identified gene signature using both supervised and unsupervised learning algorithms, applied to both individual and average gene expression data. Often, researchers prefer to examine and compare mean gene expression values across different levels instead of focusing on individual data points. The average gene expression data for time point, dose, and compound levels are calculated efficiently as , , and , respectively, where , , and (for derivation see Supplementary Method E). Here, , and .
For the supervised classification analysis, logistic regression was employed to differentiate samples treated with glutathione-depleting (positive) versus non-glutathione-depleting (negative) compounds. The log_2_(FC) gene expression values served as independent variables. The dataset was balanced, containing seven compounds in each group, with each compound assessed across three dosages, four time points, and three replicates. This resulted in a total of 252 samples per group, and 504 samples in total. To avoid overfitting, logistic regression performance was evaluated through 100 repetitions of 10-fold cross-validation, with the overall predictive accuracy summarized as the mean AUC derived from these 1000 validation runs (100 repeats × 10 folds).
For the unsupervised hierarchical clustering analysis, Ward’s agglomerative method (“ward.D2” linkage) implemented in the R function hclust was applied (Murtagh and Legendre 2014). The analysis was conducted using data exclusively from the high-dose level, performed across all measured time points, and specifically for the maximum time points, where pathological alterations are most evident. Mean gene expression values across the three replicates per treatment condition were calculated and used. The optimal cluster number was determined primarily by dendrogram visual inspection, supplemented by silhouette scores for additional validation.
4.8 External validation of gene signature
To validate the ToxAssay gene signature, we performed a cross-correlation analysis comparing gene expression data from the Open TG-GATEs dataset with an external dataset from DrugMatrix. Specifically, we focused on Methimazole, a known hepatotoxicant associated with glutathione depletion. Methimazole was selected as it appears in both datasets, allowing for a direct comparison of gene expression perturbations in both experimental systems.
The analysis aimed to assess the consistency and robustness of the ToxAssay gene signature across independent toxicogenomic platforms. By comparing gene expression profiles in both datasets, we evaluated the generalizability of the ToxAssay signature in predicting glutathione depletion-related hepatotoxicity. The strong correlation in gene expression perturbations across datasets confirms the reliability and validity of the ToxAssay signature as a tool for identifying biomarkers and predicting drug-induced toxicity.
4.9 ToxAssay implementation and functionalities
ToxAssay was implemented as a user-friendly, publicly accessible, operating system independent R package, available at https://github.com/Fun-Gene/toxassay. Detailed information about its functionalities can be found in Supplementary Method A and package demonstration.
4.10 Simulations
We conducted extensive simulations to investigate the performance of ToxAssay in detecting gene signature under a variety of scenarios. The synthetic dataset was simulated in such a way that it imitates the structure of a toxicogenomics dataset. For each simulation scenario, we generated a gene expression dataset consisting of genes from two independent compound groups: one with samples exposed to toxicity-positive compounds and the other to toxicity-negative compounds, to study targeted toxicity.
The compound exposure gene expression values of each data point were simulated to resemble in vivo single-dose data from Open TG-GATEs experiments. Therefore, the core of our simulations is to simulate the gene expression matrix of samples and genes comprising DEGs and non-DEGs, where . The expression values for each gene were generated from a Gaussian distribution , where the intercept of the time level was simulated from a Gaussian distribution , and the intercept of the dose level was simulated from , and the intercept of the compound level was simulated from . The overall intercept ( ) across all compounds was simulated from the uniform distribution and is defined using effect size ( ).
We used a standardized model where observations are normalized to a mean of 0 and standard deviation of 1, allowing the total of intercept variances ( ) approximate the overall ICC ( ). The variance of intercepts , and were simulated from a uniform distribution, with the condition . As we used a standardized model, the residual variance was generated from Gaussian distribution such that , where the explained variance can be approximated by the effect size ( ) as: (Aarts et al. 2014). Varying effect sizes were applied to a proportion ( ) of the genes to establish true effects ( ) for DEGs. For non-DEGs, we set and assume that (genes are equally expressed across all levels).
4.11 Application of ToxAssay in glutathione depletion analysis
Glutathione, a tripeptide composed of glutamate, cysteine, and glycine, plays a central role in defending against the toxicity of drugs and oxidants (Dickinson and Forman 2002; Forman et al. 2009). Upon exposure to toxic compounds, glutathione is depleted through conjugation with these substances, impairing antioxidant defenses and increasing the risk of drug-induced hepatotoxicity and other adverse effects. Understanding this mechanism is essential for developing strategies to enhance antioxidant protection and reduce drug toxicity. Consequently, glutathione depletion remains a key endpoint in drug-induced liver injury and is recognized as an early indicator of hepatotoxicity during drug development.
To investigate biomarker genes and illuminate the functional and molecular mechanisms related to glutathione depletion, we selected two sets of compounds: one comprising compounds known to conjugate with or deplete glutathione (positive set), and the other consisting of compounds not influenced by glutathione levels (negative set). The positive compounds were acetaminophen, bromobenzene, coumarin, methapyrilene, nitrofurazone, phorone, and diethyl maleate; the negative set included erythromycin- ethylsuccinate, gentamicin, glibenclamide, hexachlorobenzene, isoniazid, penicillamine, and perhexiline. All compounds have been previously employed in glutathione depletion studies (Kiyosawa et al. 2004; 2007; Gao et al. 2010; Nyström-Persson et al. 2013; Hasan et al. 2018; Rana et al. 2018). Following a similar design approach to previous research, we utilized the “rat/in vivo/liver/single” dataset of compounds from the Open TG-GATEs database for glutathione depletion analysis. The dataset consists of seven compounds × three doses × four time points × three replicates, resulting in 252 samples in both the positive and negative groups, for a total of 504 samples. Details regarding the training compounds can be found in Supplementary Table S7.
Supplementary Material
btaf561_Supplementary_Data
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Aarts E , Verhage M, Veenvliet JV et al A solution to dependency: using multilevel analysis to accommodate nested data. Nat Neurosci. 2014;17:491–6.24671065 10.1038/nn.3648 · doi ↗ · pubmed ↗
- 2Ahmed MME , Wang T, Luo Y et al Aldo-keto reductase-7A protects liver cells and tissues from acetaminophen-induced oxidative stress and hepatotoxicity. Hepatology 2011;54:1322–32.21688283 10.1002/hep.24493 · doi ↗ · pubmed ↗
- 3Alexander-Dann B , Pruteanu LL, Oerton E et al Developments in toxicogenomics: understanding and predicting compound-induced toxicity from gene expression data. Mol Omics 2018;14:218–36.29917034 10.1039/c 8mo 00042 e PMC 6080592 · doi ↗ · pubmed ↗
- 4Aoiadni N , Ayadi H, Jdidi H et al Flavonoid-rich fraction attenuates permethrin-induced toxicity by modulating ROS-mediated hepatic oxidative stress and mitochondrial dysfunction ex vivo and in vivo in rat. Environ Sci Pollut Res Int. 2021;28:9290–312.33136269 10.1007/s 11356-020-11250-9 · doi ↗ · pubmed ↗
- 5Brennan MS , Matos MF, Richter KE et al The NRF 2 transcriptional target, OSGIN 1, contributes to monomethyl fumarate-mediated cytoprotection in human astrocytes. Sci Rep. 2017;7:42054.28181536 10.1038/srep 42054 PMC 5299414 · doi ↗ · pubmed ↗
- 6Canesin G , Feldbrügge L, Wei G et al Heme oxygenase-1 mitigates liver injury and fibrosis via modulation of LNX 1/Notch 1 pathway in myeloid cells. i Science 2022;25:104983.36093061 10.1016/j.isci.2022.104983 PMC 9450142 · doi ↗ · pubmed ↗
- 7Chen M , Suzuki A, Thakkar S et al DIL Irank: the largest reference drug list ranked by the risk for developing drug-induced liver injury in humans. Drug Discov Today. 2016;21:648–53.26948801 10.1016/j.drudis.2016.02.015 · doi ↗ · pubmed ↗
- 8Chen S-C , Tsai T-H, Chung C-H et al Dynamic association rules for gene expression data analysis. BMC Genomics 2015;16:786.26467206 10.1186/s 12864-015-1970-x PMC 4606551 · doi ↗ · pubmed ↗
