ADFC‐ATP: Attention‐Guided Dual‐View Fusion and Contrastive Pretraining for Robust Aquatic Toxicity Prediction
Jixuan Jia, Xin Yang, Ying Fang, Honghong Su, Qi Zhao

TL;DR
ADFC-ATP is a new deep learning framework that improves the prediction of aquatic toxicity by combining molecular graph data with attention-based fusion and contrastive learning.
Contribution
The novel framework integrates dual-view graph fusion and contrastive pretraining to enhance robustness and accuracy in aquatic toxicity prediction.
Findings
ADFC-ATP achieves a 10.2% average AUC improvement over baseline models on fish toxicity datasets.
Attention-based fusion and scaffold preservation are critical for model performance and interpretability.
The model identifies toxicophores consistent with QSAR principles, enhancing chemical risk assessment.
Abstract
The rising levels of chemical pollutants in aquatic ecosystems threaten biodiversity and demand improved methods for assessing ecological risk. Recent deep learning methods advance molecular toxicity prediction but still suffer from limited generalisation, interpretability and robustness under data scarcity. To address these issues, we propose ADFC‐ATP, a framework that integrates dual‐view molecular graph fusion with contrastive topology learning based on NT‐Xent loss. Our approach uses structural graph augmentations during pretraining to enhance robustness, while a graph attention encoder learns hierarchical substructure patterns through masked feature reconstruction. For downstream aquatic toxicity prediction, an adaptive attention‐based fusion mechanism dynamically combines pretrained graph embeddings and fingerprint similarity metrics, enabling more accurate and robust toxicity…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
FIGURE 1
FIGURE 2
FIGURE 3
FIGURE 4
FIGURE 5
FIGURE 6
FIGURE 7| Parameter | Description | Range |
|---|---|---|
| Batch_size | Input batch size | {100} |
| Epochs | Warm‐up epoch | {50} |
| Learning_rate | Initial learning rate | {0.001} |
| Weight_decay | Weight decay for Adam | {0.000001} |
| Augment_ratio | Data enhancement ratio | {0.5} |
| Model_type | GNN backbone | {gat} |
| P_dropout | Dropout ratio | {0.2} |
| Pool | Readout pooling | {mean, max, add} |
| Temperature | Remperature of NT‐Xent loss | {0.1} |
| Parameter | Description | Range |
|---|---|---|
| Batch_size | Input batch size | {64, 128} |
| Learning_rate | Initial learning rate | {0.001} |
| Weight_decay | Weight decay for Adam | {0.0001} |
| Epochs | Total number of epochs | {50, 150} |
| P_dropout | Dropout ratio | {0.2} |
| Pool | Readout pooling | {mean, max, add} |
| Seed | Random seed | {42} |
| Fish species | Models | AUC | ACC | PR | RE |
|---|---|---|---|---|---|
| RT |
ADFC‐ATP CLSSATP |
0.912 |
0.864 |
0.809 0.805 |
0.905 0.762 |
| ATFPGT | 0.928 | 0.854 |
| 0.770 | |
| GCN‐MT | 0.804 | 0.719 | 0.675 | 0.799 | |
| GCN‐ST | 0.835 | 0.787 | 0.643 | 0.872 | |
| SVM‐KRFP | 0.794 | 0.827 | 0.667 | 0.922 | |
| RF‐MACCS | 0.822 | 0.849 | 0.714 |
| |
| RF‐KRFP | 0.800 | 0.831 | 0.679 | 0.922 | |
| FHM |
ADFC‐ATP CLSSATP |
0.894 |
0.774 |
0.846 |
0.821 |
| ATFPGT | 0.881 | 0.779 | 0.902 | 0.729 | |
| GCN‐MT | 0.744 | 0.505 | 0.870 | 0.454 | |
| GCN‐ST | 0.847 | 0.770 | 0.812 | 0.734 | |
| SVM‐PubChem | 0.787 | 0.787 | 0.788 | 0.787 | |
| SVM‐KRFP | 0.832 | 0.833 | 0.812 | 0.851 | |
| RF‐PubChem | 0.762 | 0.764 | 0.738 | 0.787 | |
| SHM |
ADFC‐ATP CLSSATP |
0.910 |
0.824 0.859 |
0.667
|
0.909 0.791 |
| ATFPGT | 0.906 |
| 0.700 | 0.678 | |
| GCN‐MT | 0.693 | 0.604 | 0.733 | 0.566 | |
| GCN‐ST | 0.859 | 0.881 | 0.625 | 0. | |
| ANN‐CDK | 0.743 | 0.797 | 0.625 | 0.860 | |
| RF‐PubChem | 0.786 | 0.831 | 0.688 | 0.884 | |
| SVM‐CDKExt | 0.786 | 0.831 | 0.688 | 0.884 | |
| BS |
ADFC‐ATP CLSSATP |
0.906
|
0.885 |
0.824 0.841 |
0.802 |
| ATFPGT | 0.932 | 0.869 |
| 0.801 | |
| GCN‐MT | 0.796 | 0.674 | 0.599 | 0.880 | |
| GCN‐ST | 0.875 | 0.826 | 0.727 | 0.880 | |
| ANN‐KRFP | 0.775 | 0.794 | 0.709 | 0.840 | |
| SVM‐KRFP | 0.780 | 0.800 | 0.709 | 0.850 | |
| RF‐KRFP | 0.761 | 0.787 | 0.673 | 0.850 |
| Fish species | Method | AUC | ACC | PR | RE |
|---|---|---|---|---|---|
| RT | ADFC‐ATP |
|
| 0.809 |
|
| ADFC‐ATP_0.3 | 0.934 | 0.871 |
| 0.810 | |
|
ADFC‐ATP_0.0 ADFC‐ATP_NPL ADFC‐ATP_ ECFP |
0.925 0.803 0.725 |
0.815 0.823 0.702 |
0.686 0.679 0.564 |
0.833 0.905 0.624 | |
| FHM | ADFC‐ATP |
|
| 0.849 |
|
| ADFC‐ATP_0.3 | 0.915 | 0.739 |
| 0.857 | |
|
ADFC‐ATP_0.0 ADFC‐ATP_NPL ADFC‐ATP_ ECFP |
0.907 0.824 0.732 |
0.830 0.819 0.649 |
0.851 0.820 0.648 |
0.816 0.837 0.714 | |
| SHM | ADFC‐ATP |
|
| 0.667 |
|
| ADFC‐ATP_0.3 | 0.906 | 0.817 |
| 0.818 | |
|
ADFC‐ATP_0.0 ADFC‐ATP_NPL ADFC‐ATP_ ECFP |
0.901 0.820 0.704 |
0.814 0.803 0.676 |
0.667 0.714 0.550 |
0.909 0.909 0.636 | |
| BS | ADFC‐ATP |
|
|
|
|
| ADFC‐ATP_0.3 | 0.894 | 0.871 | 0.722 | 0.839 | |
|
ADFC‐ATP_0.0 ADFC‐ATP_NPL ADFC‐ATP_ ECFP |
0.895 0.808 0.751 |
0.843 0.843 0.640 |
0.611 0.793 0.489 |
0.701 0.742 0.742 |
- —Science and Technology Plan Project of Liaoning Province
- —Fundamental Research Funds for the Liaoning Universities
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Bayesian Modeling and Causal Inference · Advanced Neural Network Applications
Introduction
1
The large‐scale production and use of synthetic chemicals have greatly benefited modern society but also raised significant ecological and environmental concerns [1]. Notably, persistent organic pollutants such as polycyclic aromatic hydrocarbons, phthalates and perfluorooctanoic acid are frequently detected in the environment due to their persistence and mobility, posing risks to ecosystems and human health. As a result, international agencies like EPA and OECD require comprehensive ecotoxicity assessments for chemicals before they enter the market [2]. These pollutants persist in ecosystems through water cycles, atmospheric transport and bioaccumulation [3]. Aquatic organisms like fish absorb these chemicals through dermal absorption, gill respiration and ingestion, predominantly accumulating them in organs such as liver, adipose tissue and nervous system. This leads to direct toxic effects, including growth inhibition, reproductive dysfunction and behavioural abnormalities, and potentially threatens human health via biomagnification through the food chain [4, 5].
Traditional toxicity assessment methods predominantly rely on in vivo experiments using model organisms like zebrafish and medaka [6]. These experiments are not only time‐consuming and resource‐intensive but also raise ethical and animal welfare concerns. With thousands of new chemicals introduced globally each year, this animal‐based evaluation approach increasingly struggles to meet rising assessment demands, urgently necessitating the development of efficient and accurate alternative methods [7, 8]. In this context, computational toxicology offers a new paradigm by establishing quantitative structure–activity relationship (QSAR) models that link molecular structural features with toxicity endpoints for efficient risk assessments [9]. Recent advancements in computational power and big data technologies have driven significant successes for deep learning techniques in fields such as computational toxicology [10, 11, 12, 13, 14], biomacromolecule interaction prediction [15, 16, 17], metabolite‐disease association prediction [18, 19] and remote health monitoring [20, 21], thereby further revolutionising toxicity prediction methodologies. Compared to traditional approaches, deep neural networks autonomously extract features, effectively interpreting complex molecular spatial conformations. They possess the potential for multi‐source data integration, facilitating cooperative analysis of molecular fingerprints and graphical representations, and feature a transfer learning framework that notably enhances model generalisability in small‐sample scenarios.
Currently, computational toxicology mainly uses two molecular representation approaches: fingerprint or descriptor‐based feature vectors, and graph neural network‐based molecular graph representations [22]. The former is simple and interpretable, while the latter automatically extracts complex structural information but is less interpretable. In this study, we employ the AttentiveFP (Attentive Fingerprint) network to extract structural features of molecules. AttentiveFP, proposed by Xiong et al. in 2020, is an attention‐based graph neural network algorithm for molecular fingerprinting [23]. Unlike traditional fingerprints, AttentiveFP automatically aggregates atomic and neighbourhood information to generate expressive molecular representations. This attention‐based molecular fingerprint can simultaneously capture both local structural motifs and global topological information, resulting in a more discriminative and generalisable molecular representation. Compared with traditional fixed fingerprint representations, neural network‐based molecular fingerprints with attention mechanisms can automatically learn substructural features most relevant to downstream tasks through end‐to‐end training [24]. By multi‐level aggregation, they capture non‐local dependencies, resulting in higher expressive power and adaptability. Meanwhile, the attention mechanism also endows the model with improved interpretability [25]. Another approach employs molecular graph representation, constructing topological graphs with atoms as nodes and chemical bonds as edges, using graph attention network (GAT) for end‐to‐end spatial relationship modelling. In 2017, Gilmer et al. proposed message passing neural network (MPNN), a deep learning framework specifically designed for molecular property prediction tasks [26]. The core idea of MPNN is to use a graph neural network architecture to jointly model the features of nodes and edges in a molecule in order to predict its physicochemical properties. They unified multiple existing graph neural network models into a single framework with MPNN, enabling comparisons and improvements between different graph neural network methods. In 2022, Wu et al. developed an interpretable deep learning model based on MPNN architecture to predict aquatic toxicity across multiple species [27]. By integrating SMILES‐encoded molecular structures with a neural attention mechanism, this model achieves high predictive accuracy on toxicity endpoints for fish, daphnia and algae. Additionally, SHAP‐based interpretability analysis is conducted to identify substructures highly associated with toxicity, highlighting the potential of attention‐based GNNs for both performance and explainability in ecotoxicology. Building upon the advancement of graph neural architectures, during the same period, Tang et al. proposed a multi‐task graph neural network to predict chemical acute toxicity across multiple aquatic species. The model uses a shared GNN encoder and separate output heads for different species, achieving better performance than traditional QSAR and single‐task models, particularly under low‐data conditions [28]. Later, Xu et al. investigated multiple chemical aquatic toxicity prediction methods based on machine learning and deep learning [29]. Using ECOTOX database and published literature, they collected acute toxicity data (LC50) of 1874 compounds on four fish species, and classified compounds as toxic or non‐toxic according to LC50 threshold. This study systematically compares the performance of various machine learning models and graph neural networks (GCN) in aquatic toxicity classification. Results show that single‐task GCN achieves higher predictive accuracy than traditional methods, while multitask GCN does not demonstrate significant advantages in multi‐species fusion tasks. However, these models have certain limitations. For example, the applicability of global models is constrained by interspecies toxicity differences, and cross‐species prediction becomes less robust, especially in scenarios with sparse data or novel species. Consequently, in 2024, Yang et al. proposed a multi‐task deep neural network model named ATFPGT‐multi to simultaneously predict 96 h‐LC50 acute toxicity of organic compounds to bluegill sunfish, rainbow trout, fathead minnow and sheepshead minnow [30]. This model first extracts Morgan, MACCS, and RDKit fingerprints and then reduces their dimensionality using a multilayer perceptron to generate fingerprint embeddings. Concurrently, it encodes the molecular graphs with a GNN combined with a transformer‐based graph attention mechanism to obtain graph embeddings. Finally, it fuses these two embeddings and employs four independent classification heads for multi‐task learning. However, existing molecular representation methods often suffer from inadequate structural encoding, limited semantic understanding and poor generalisation across species, which constrain the performance of aquatic toxicity prediction. In 2025, Lin et al. tackled the problem of limited feature representation in aquatic toxicity prediction by combining contrastive and self‐supervised learning strategies [31]. This approach leverages both molecular fingerprints and molecular graphs, deeply extracting and integrating local structural features and global spatial information from two perspectives. Although their method improves the accuracy and generalisation of multi‐species aquatic toxicity prediction through the combination of contrastive and self‐supervised learning, it still has certain limitations. Fingerprint‐based approaches often overlook spatial relationships within molecules, while graph‐based models may lack sufficient sensitivity to the activities of specific functional groups. Effectively integrating molecular functional attributes with spatial topological information [32], and building predictive models that are both interpretable and generalisable, remain major scientific challenges in computational toxicology.
To address these challenges, we propose ADFC‐ATP, which integrates contrastive learning and self‐supervised learning approaches and applies diverse augmentation strategies to the pretraining data. Specifically, our method employs a scaffold‐preserving strategy to maintain Murcko core scaffold, ensuring that key structural information is retained. Additionally, ADFC‐ATP introduces functional‐group perturbations to simulate side‐chain diversity and generate chemically plausible novel samples. A contrastive‐learning fusion mechanism then adaptively injects the augmented information into the original sample features, enhancing the embedding space's sensitivity to both local and global structural variations. Self‐supervised learning effectively mines the inherent spatial information in molecular graphs, enabling our model to develop a comprehensive understanding of molecular structure. Moreover, a transfer‐learning framework transfers pretrained knowledge from 300,000 unlabelled molecules to the small‐sample toxicity prediction tasks, demonstrating strong potential in environmental toxicity assessment. Experimental results show that ADFC‐ATP outperforms baseline models across multiple aquatic toxicity benchmarks, achieving an average AUC improvement of approximately 10.2% and thus demonstrating its effectiveness. This improvement highlights ADFC‐ATP's superior ability to capture critical molecular features and its robustness in handling complex aquatic toxicity prediction tasks, even with small or imbalanced datasets.
Materials and Methods
2
Data Preparation
2.1
We curate a comprehensive pretraining dataset that consists of 306,347 unique molecules from the substance channel of the ZINC database [33]. Each molecule in this collection is reported or inferred to exhibit bioactivity at 10 μM or better in direct binding assays. To ensure both diversity and relevance, we include only molecules with high‐confidence activity annotations, and we filter out compounds with ambiguous or conflicting assay results. The dataset covers a broad chemical space and contains a wide range of scaffolds, functional groups and molecular properties. This diversity is critical for robust model pretraining, as it enables ADFC‐ATP to learn generalisable chemical representations that are not biased towards a specific target or assay condition. For molecular graph construction, we utilise the open‐source RDKit toolkit to convert each molecule's SMILES string into a graph, with nodes representing atoms and edges representing chemical bonds. These molecular graphs are subsequently used as inputs for the GAT module.
- Our fine‐tuned dataset utilises for evaluating downstream task performance is derived from ECOTOX database, encompassing four fish species: bluegill sunfish ( Lepomis macrochirus , BS), rainbow trout ( Oncorhynchus mykiss , RT), fathead minnow ( Pimephales promelas , FHM) and sheepshead minnow ( Cyprinodon variegatus , SHM) [34]. The chosen toxicity endpoint is 96‐h LC50 value. To maintain dataset precision and eliminate redundancy from multiple experiment repetitions, we apply preprocessing steps using RDKit package as outlined below. Removal of salts and inorganic compounds to exclusively retain organic substances.
- Elimination of all isomeric forms, including stereoisomers and cis‐trans isomers, to streamline molecular representation and analysis.
- Standardisation of molecular structure representation through canonical SMILES notation, preparing molecules for the subsequent analyses.
- Consolidation of identical molecule records to decrease redundancy and enhance dataset coherence.
- Classification of organic compounds based on EEC 92/32/EEC criterion, designating substances with a 96 h‐LC50 below 10 mg/L as ‘toxic’ and others as ‘nontoxic’.
Following these preprocessing steps, the final counts for BS, RT, FHM and SHM are 892, 1236, 932 and 343 compounds, respectively. The toxic to nontoxic ratio for BS, RT and SHM is around 6:4, while for FHM it is nearly balanced at 1:1. Figure 1 illustrates the distribution proportions of each fish species within the total aquatic toxicity dataset, alongside the respective numbers of toxic and nontoxic substances for each species.
(a) The amount of data for each fish species. (b) The number of non‐toxic and toxic compounds for each fish proportion.
Model Framework
2.2
The framework of ADFC‐ATP designed for predicting the toxicity of four fish species, as depicted in Figure 2. It primarily consists of three main components: multi‐view molecular graph augmentation and fusion module, contrastive learning module based on molecular graph features, and downstream task module. First, to effectively enhance the model's ability to learn and generalise molecular structures, our study proposes a scaffold‐preserving data augmentation strategy based on Murcko scaffold. Specifically, we introduce structural diversity in the augmented samples via perturbations such as edge reconnection and functional‐group substitution, all while preserving each molecule's Murcko core scaffold. From these operations, two distinct augmented graphs are generated. This approach enables ADFC‐ATP to better capture and understand the critical relationship between molecular scaffold structures and their biological activities, thereby improving predictive performance. The fingerprint similarity between the original graph and the two augmented graphs is calculated, obtaining the weight coefficients α1 and α2. Based on these coefficients, node features and adjacency matrices are linearly mixed, and the final embeddings are obtained through training, followed by adaptive weighted fusion using node‐level cosine similarity. Second, contrastive learning is constructed by employing masking strategies, forcing our model to learn the intrinsic relationships within the molecule and their association with global molecular properties, thereby enhancing the model's ability to understand the global features of molecules. At the same time, self‐supervised learning uses molecular graphs to train ADFC‐ATP, capturing structural features, local and global relationships, and chemical reactivity information of the molecules [35]. Finally, we transfer the pretrained GAT parameters to the downstream task and fine‐tune them on the original molecular graphs to obtain refined graph structural embeddings. Simultaneously, molecular fingerprint features are encoded using a multilayer perceptron (MLP). Subsequently, we employ the AttentiveFP network to directly extract the neural fingerprint of molecule as another view of the features. Subsequently, through an attention‐based fusion mechanism, the fine‐tuned graph embedding and AttentiveFP fingerprint are adaptively and mixed in a weighted manner, resulting in a unified and more discriminative molecular representation for aquatic toxicity prediction. As described above, our model is trained under a dual‐view data augmentation strategy. In the downstream task, we combine molecular fingerprints generated by AttentiveFP with molecular graph features obtained from a fine‐tuned GAT to predict toxicity. The dual‐view data augmentation rearranges molecular functional groups to simulate chemical diversity, which enhances the robustness of the module. By leveraging AttentiveFP‐generated fingerprints, which offer greater expressive power and adaptability than traditional fingerprints, together with the interpretability provided by the attention mechanism [36], our model achieves improved performance and interpretability in toxicity prediction.
The workflow of ADFC‐ATP. (a) Multiple structural perturbations, data augmentation and feature fusion. (b) Contrastive self‐supervised pretraining based on multi‐view augmentation. (c) Transferring the pretrained features to downstream supervised aquatic toxicity tasks.
Dual‐View Data Augmentation
2.3
Molecular graphs map chemical molecular structures into a graph‐theoretic framework and are widely used in cheminformatics and computational chemistry. In this study, to fully leverage large amounts of unlabelled molecular data, we design a self‐supervised pretraining pipeline based on molecular graphs. Its core idea is that, by applying graph‐level data augmentation and contrastive learning, our model learns robust structural representations without any labels. Figure 2a illustrates the entire augmentation workflow. First, we use RDKit to convert each SMILES string into an RDKit molecule object, map it to an undirected graph and extract node‐level features such as one‐hot atom types, formal charges, hydrogen counts, aromaticity flags, etc. to obtain a set of original graphs G. To create rich positive pairs for contrastive training, we perform two complementary augmentations on selected originals.
- Scaffold augmentation: We extract Murcko scaffold and then randomly reconnect a proportion of bonds in the molecular adjacency while preserving the core scaffold structure.
- Group augmentation: We randomly replace edges corresponding to non‐scaffold functional groups in the original graph, simulating side‐chain diversity.
Each augmentation iteratively produces one perturbed graph, ensuring the node count matches the original and discarding any chemically invalid structures via a validity check. After obtaining the two augmented graphs G _ sca _ and G _ gro _, we separately compute the RDKit fingerprint similarity between the original graph G and each augmented graph, i.e., we calculate sim _ sca _ (Finger, Finger _ sca _) and sim _ gro _ (Finger, Finge _ gro _). Subsequently, these two similarity values are normalised to obtain the two weight coefficients α 1 and α 2. We then linearly combine the corresponding node feature vectors according to these weights and perform an analogous weighted mixing on the adjacency matrices. Finally, we threshold the mixed adjacency and convert it back to sparse format. The result is a fused augmented graph G' that jointly encodes both the molecule's structural topology and its chemical information.
Graph Attention Network
2.4
GAT extends the standard GNN neighbourhood aggregation operation by introducing a self‐attention mechanism, which enables the model to assign learnable weights to different neighbours and thus distinguish their importance in message passing. In this study, we represent each molecule as an undirected graph G = (V, E) where the node set V corresponds to atoms and the edge set E corresponds to chemical bonds. Initially, each node v carries a feature vector h _ v _ based on atom type, charge and heterocycle information; it typically represents the feature vector of node v, which serves as the atomic representation input to the current GAT layer. In a molecular graph, each node corresponds to an atom. To fully exploit structural information in molecular graphs, we employ a GAT in place of traditional GNN aggregation methods. GAT performs message passing by assigning learnable attention weights to neighbouring nodes as follows:
For node v and each neighbour u∈Nv, we first compute the unnormalised attention coefficient a _ uv _, where Wk∈ℝd'×d is the linear transformation matrix at layer k. ak∈ℝ2d′ is the learnable attention vector, and || denotes vector concatenation. e _ uv _ denotes the unnormalised attention coefficient from node u to node v which measures the contribution of neighbour u to the representation update of centre node v. It is computed by first concatenating the feature vectors h _ v _ and h _ u _ applying a shared linear transformation, and then passing the result through a learnable attention mechanism. Finally, we update node v's feature vector h _ v _ as:
where σ⋅ represents ReLU activation function. To enhance robustness and expressiveness, we run H independent attention heads in parallel at each layer and concatenate their outputs to obtain the final node representation for that layer. After L layers of GAT updates, we apply a graph‐level readout operation to aggregate all final node features into a global molecular representation h _ G _:
This design follows the classic GAT architecture introduced by Veličković [34], combining multi‐head attention and readout operations to deliver powerful molecular representations for downstream toxicity prediction.
Contrastive Learning
2.5
To enhance the performance of GAT molecular graph feature learning, we introduce a contrastive learning framework, as illustrated in Figure 2b. The core idea is to train the model by augmenting sample pairs, allowing it to better capture global features of molecular graphs, and by maximising the similarity between positive sample pairs while minimising the similarity between negative sample pairs (graphs from different molecules). In this study, we use data augmentation techniques to generate multiple views of the molecular graphs. Each molecule is augmented in different ways to produce multiple versions, resulting in positive and negative sample pairs, positive pairs refer to two augmented views derived from the same molecule, whereas negative pairs are composed of views originating from different molecules. Each augmented graph is embedded using GAT module to obtain node feature representations. These feature representations are then input into the contrastive learning loss function for optimisation, ensuring that the representations of different augmentations of the same molecule are as similar as possible, while those of different molecules are distinguishable. We then apply NT‐Xent loss, the loss for each batch is summed to obtain the total contrastive learning loss L _ ss _. It optimises the feature learning process by calculating the cosine similarity of positive and negative sample pairs. In each batch, the similarity of positive sample pairs is maximised, while the similarity of negative sample pairs is minimised. This strategy enables the model to learn more discriminative and robust features, improving its ability to capture subtle molecular features and their relationships. Such representations are particularly beneficial for downstream tasks like aquatic toxicity prediction, where understanding fine‐grained structural differences between molecules is essential for accurately distinguishing toxic and non‐toxic compounds in aquatic environments. By leveraging contrastive learning, our model becomes better equipped to detect molecular patterns that are critical for inferring the potential harm a compound may cause to aquatic organisms:
where z _ i _ and z _ i’ _ are the latent vectors from the positive sample pairs, τ is the temperature parameter, N is the batch size, sim(z _ i,_ z _ i' _) is used to compute the similarity between two vectors.
Pretraining Process
2.6
The pretraining experiments are performed on an NVIDIA A100 GPU. During the pretraining phase, our model leverages large‐scale unlabelled molecular graphs for self‐supervised learning, aiming to learn robust and generalisable molecular representations. We input both the original graphs and augmented graphs into GAT encoder. Then GAT dynamically assigns weights to neighbouring nodes through a multi‐head attention mechanism, completing the message passing process and effectively capturing both local and global structural information. ADFC‐ATP learns discriminative embeddings invariant to augmentation perturbations by minimising the self‐supervised contrastive loss, which maximises the similarity between representations of different augmented views of the same molecule (positive pairs) and minimises the similarity between different molecules (negative pairs). The total loss, which includes the self‐supervised contrastive loss and possibly other losses, is then used for backpropagation training. ADFC‐ATP trains on the pretraining dataset for 50 epochs, and the model parameters from the final epoch, which show the best performance, are selected for subsequent supervised fine‐tuning tasks. For detailed parameter settings of pretraining, please refer to Table 1.
Fine‐Tuning Process
2.7
We fine‐tune the pretrained ADFC‐ATP on four fish toxicity datasets to further improve its performance in aquatic toxicity prediction. In this study, a multi‐task deep learning approach is employed to predict aquatic toxicity. First, the structural information of each molecular sample is converted into relevant descriptors, which serve as the input features for our model. All tasks share a common feature extraction layer to capture the general structural characteristics of the molecules. In the upper layers of the network, individual output branches are established for each toxicity prediction task to generate the corresponding predictions. During model training, a multi‐task loss function is used, where the losses of all tasks are weighted, summed, and then backpropagated together. Through this parameter‐sharing mechanism, different tasks leverage each other [37], fully exploiting the correlations among tasks and effectively enhancing the model's generalisation ability across tasks, particularly in cases where some toxicity endpoints have limited samples. After training, the model outputs the predictions for all toxicity endpoints in a single pass, achieving efficient parallel multi‐task prediction. Each dataset is randomly split into training, validation, and test sets in an 8:1:1 ratio. During fine‐tuning, the GAT backbone is initialised with pretrained parameters, and a trainable fully connected layer is appended to output the toxicity predictions (see Figure 2c). For detailed parameter settings of fine‐tuning, please refer to Table 2.
Results
3
Comparison With Other Methods
3.1
In this study, four evaluation metrics: AUC, accuracy (ACC), recall (RE) and precision (PR) are employed to comprehensively assess the classification performance of our model. AUC (area under the curve) is one of the most crucial evaluation indicators for binary classification models and is widely used in the field of bioinformatics. This metric analyses the model's ability to distinguish between positive and negative samples under different classification thresholds by plotting the ROC curve of true positive rate and false positive rate. The closer the AUC value is to 1, the better the overall performance of ADFC‐ATP. ACC is used to measure the overall correctness of the model's predictions. However, when the class distribution is imbalanced, relying solely on ACC may underestimate the model's ability to identify minority class samples. RE reflects the proportion of actual positive cases that are correctly identified by the model and mainly indicates the model's ability to detect positive samples, making it suitable for evaluating the risk of missed detections. PR measures the proportion of true positive cases among all samples predicted as positive by the model, indicating the reliability of positive prediction results. The formulas for these metrics are as follows:
In the above formulas, TP (True Positive), TN (True Negative), FP (False Positive) and FN (False Negative) are key components of the confusion matrix used for evaluating classifier performance. Specifically, in the context of aquatic toxicity prediction, TP refers to the samples that are toxic to aquatic organisms and are correctly predicted as toxic by our model. TN indicates the samples that are non‐toxic and are correctly predicted as non‐toxic. FP corresponds to the non‐toxic samples that are incorrectly predicted as toxic, leading to false positives. FN refers to the toxic samples that are incorrectly predicted as non‐toxic, leading to false negatives. These components collectively help assess ADFC‐ATP's ability to correctly identify toxic and non‐toxic compounds in aquatic toxicity prediction.
To comprehensively evaluate the performance of our model, we compare it with the deep learning models ATFPGT and CLSSATP, as well as with a previous study where we apply nine machine learning algorithms and four molecular fingerprints to build a total of 36 models, including both single‐task GCN (GCN‐ST) and multi‐task GCN (GCN‐MT). Specifically, for each fish dataset, we select the top three traditional machine learning methods. To be more precise, for FHM dataset, our choices are SVM‐PubChem, RF‐PubChem and SVM‐KRFP. For RT dataset, we select SVM‐KRFP, RF‐KRFP and RF‐MACCS. Regarding SHM dataset, we include ANN‐CDK, RF‐PubChem and SVM‐CDKExt. For BS dataset, we collect ANN‐KRFP, SVM‐KRFP and RF‐KRFP. Additionally, all experiments include GCN‐MT and GCN‐ST as baseline models. We compare the performance of ADFC‐ATP with models mentioned above. Experimental results show that except for BS dataset, ADFC‐ATP achieves higher AUC than all baseline models on RT, FHM and SHM datasets. On BS dataset, while ADFC‐ATP's AUC is slightly lower than that of CLSSATP and ATFPGT, it delivers better or comparable performance in ACC, PR and RE, indicating that our model remains competitive in overall classification performance. All models used for comparison are listed in Table 3.
For RT dataset, ADFC‐ATP achieves AUC improvements of 3.0%, 1.4%, 13.8%, 10.7%, 12.0% and 14.2% over that of CLSSATP, ATFPGT, GCN‐MT, GCN‐ST, SVM‐KRFP and RF‐MACCS, respectively. On FHM dataset, its AUC surpasses that of CLSSATP, ATFPGT, GCN‐MT, GCN‐ST, SVM‐PubChem SVM‐KRFP, and RF‐PubChem by 3.3%, 4.6%, 18.3%, 8.0%, 14.0%, 9.5% and 16.5%, respectively. Regarding SHM dataset, ADFC‐ATP outperforms CLSSATP, ATFPGT, GCN‐MT, GCN‐ST, ANN‐CDK, RF‐PubChem and SVM‐CDKExt by 0.3%, 0.7%, 22.0%, 5.4%, 17.0%, 12.7% and 12.7%, respectively, in terms of AUC. For BS dataset, ADFC‐ATP achieves higher AUC values than that of GCN‐MT, GCN‐ST, ANN‐KRFP, SVM‐KRFP and RF‐KRFP by 11.0%, 3.1%, 13.1%, 12.6% and 14.5%, respectively. Although its AUC is slightly lower than that of CLSSATP and ATFPGT by 3.3% and 2.6%, ADFC‐ATP attains ACC, with improvements of 1.4% and 3.0% over these models. This phenomenon may be attributed to BS dataset's toxic‐sample ratio of 63.3%, which deviates from the ideal 50% class balance. Additionally, as shown in Figure 3, the bar chart compares ADFC‐ATP with CLSSATP, ATFPGT, GCN‐MT and GCN‐ST. It can be observed that, in the vast majority of cases, ADFC‐ATP achieves a higher AUC compared to the other models.
Heatmap comparison of performance across different models on four fish sets.
Furthermore, Figure 4 demonstrates that ADFC‐ATP surpasses most comparison models in terms of ACC, PR and RE. Specifically, for RT dataset, ADFC‐ATP achieves an ACC of 0.895, which is higher than CLSSATP with 0.864 and ATFPGT with 0.854. In terms of PR, ADFC‐ATP leads with 0.809, outperforming CLSSATP with 0.805 and ATFPGT with 0.806. For RE, ADFC‐ATP achieves 0.905, significantly better than CLSSATP with 0.762 and ATFPGT with 0.866. In FHM dataset, ADFC‐ATP achieves ACC of 0.872, higher than CLSSATP with 0.774 and ATFPGT with 0.779. ADFC‐ATP also outperforms in PR with 0.849, while PR of CLSSATP and ATFPGT are 0.846 and 0.834, respectively. In terms of RE, ADFC‐ATP leads with 0.918, better than CLSSATP with 0.821 and ATFPGT with 0.729. For SHM dataset, ADFC‐ATP achieves ACC of 0.824, surpassing CLSSATP with 0.859 and ATFPGT with 0.887. ADFC‐ATP also leads in PR with 0.667, whereas CLSSATP and ATFPGT are 0.748 and 0.700, respectively. For RE, ADFC‐ATP achieves 0.909, higher than CLSSATP with 0.791 and ATFPGT with 0.835. Last, for BS dataset, ADFC‐ATP achieves ACC of 0.899, surpassing CLSSATP with 0.824 and ATFPGT with 0.887. ADFC‐ATP also excels in PR with 0.903, outperforming CLSSATP with 0.866 and ATFPGT with 0.890. In RE, ADFC‐ATP achieves 0.903, higher than CLSSATP with 0.791 and ATFPGT with 0.883. These results show that ADFC‐ATP consistently outperforms CLSSATP and ATFPGT across the ACC, PR and RE, demonstrating its superior performance in aquatic toxicity prediction. This indicates that our model achieves high overall predictive accuracy and correctly identifies a large proportion of true positives among all samples predicted as positive. The performance comparison of different models across four fish toxicity datasets is visualised in Figure 5. As shown in the heatmaps, each subfigure presents the values of four key evaluation metrics (AUC, ACC, RE and PR) for all methods on the corresponding dataset. This comprehensive visualisation intuitively highlights the strengths and weaknesses of each model in different aspects and datasets. For example, ADFC‐ATP generally achieves higher values across most metrics and datasets, further confirming its robust and consistent performance. Meanwhile, the colour gradients clearly reflect the variation in metric values among models, which provides direct evidence of ADFC‐ATP's superiority in both overall classification accuracy and the reliability of positive predictions. Such detailed comparison helps to identify not only the best‐performing models, but also their specific advantages on various endpoints.
AUC comparison of ADFC‐ATP, CLSSATP, ATFPGT, GCN‐ST and GCN‐MT on four fish datasets.
Confusion matrices for the four aquatic toxicity prediction datasets under random splitting.
Ablation Experiments
3.2
ADFC‐ATP applies data augmentation to its pretraining data and leverages a self‐supervised learning module to fine‐tune downstream tasks. To fully assess the contributions of both augmentation and pretraining, we perform a series of ablation experiments. Accordingly, we evaluate three settings: 50%, 30% and 0% augmentation, which are denoted as ADFC‐ATP, ADFC‐ATP_0.3 and ADFC‐ATP_0.0, respectively. We vary the augmentation ratio up to 50%, as an excessive amount of augmented samples beyond this threshold may introduce noise and distortion, causing the model to drift from the true distribution and impair its ability to recognise unaugmented molecular graphs. We also measure performance without any pretraining, referring to this variant as ADFC‐ATP_NPL. In addition to evaluating the impact of data augmentation ratio and pretraining, we further investigate the effect of different molecular fingerprints. In our ablation experiments, we use the RDKit toolkit to generate ECFP fingerprints with a radius of 2 and a length of 1024 bits. Specifically, we compare the performance of ADFC‐ATP using traditional ECFP fingerprints instead of AttentiveFP, referring to this variant as ADFC‐ATP_ECFP.
All ablation experiments use the same configuration. The performance of each model on the four fish species datasets appears in Figure 6. The results demonstrate that ADFC‐ATP achieves significantly higher AUC values on all four fish species compared to ADFC‐ATP_0.3 and ADFC‐ATP_0.0, indicating that an augmentation ratio of 50% yields the best performance. Meanwhile, as shown in Table 4, the AUC of ADFC‐ATP_NPL on the four fish toxicity datasets is 0.903, 0.884, 0.880 and 0.888, respectively, proving that introducing the pretraining module improves classification ability by 9.8%, 13.9%, 10.3% and 9.3% compared to models without pretraining. The results show that the augmented samples, generated through scaffold‐preserving edge reconnection and functional group substitution, simulate natural molecular variations and side‐chain diversity. These augmented samples enable the model to adapt to subtle changes in molecular structures, significantly enhancing its generalisation ability. Specifically, the AUC of ADFC‐ATP_ECFP on the four fish toxicity datasets is 0.725, 0.732, 0.704 and 0.751, respectively, demonstrating that our model using AttentiveFP fingerprints achieves improvements of 21.7%, 19.5%, 20.9% and 15.5% compared to ADFC‐ATP_ECFP. This highlights that the introduction of augmented samples not only enhances model adaptability but also leads to better performance across multiple datasets.
AUC comparison for ADFC‐ATP and ablation experiments across four fish species datasets.
Overall, augmented samples generated through scaffold‐preserving edge reconnection and functional group substitution can simulate natural molecular variations and side‐chain diversity, enabling our model to adapt to subtle changes in molecular structure and improve its generalisation ability. At the same time, pretraining allows ADFC‐ATP to learn general molecular representations from large amounts of unlabelled data, providing better initial weights for downstream small‐sample tasks and resulting in superior predictive performance. This suggests that AttentiveFP fingerprints are more effective in capturing task‐relevant molecular structural features, thus enhancing the model's performance in complex molecular classification tasks. In conclusion, the analysis of the different ablation versions highlights the critical role of data augmentation, training optimisations and fingerprint types in improving the model's ability to represent complex structures and functional groups. The superior performance of ADFC‐ATP underscores the importance of combining neural network‐based fingerprints and appropriate data augmentation strategies in molecular modelling tasks.
Model Interpretability
3.3
Identifying molecules with structurally similar yet opposite properties has long been a challenge in the field of molecular property prediction. This requires the model to accurately recognise those substructures that are small yet crucial in molecular properties.
Our training strategy is believed to enhance ADFC‐ATP's ability to capture subtle differences between molecules. Figure 7 shows four pairs of molecules with Dice similarity over 90% based on Morgan fingerprints, but opposite toxicity in our aquatic toxicity dataset, all of which are successfully distinguished by ADFC‐ATP. Although these molecules share the same core scaffold, their aquatic toxicity differs significantly; even small variations in substituents can greatly impact their aquatic toxicity. By masking different atoms, we obtain their weights. During the fine‐tuning phase, ADFC‐ATP pays more attention to the substituents and functional groups that determine aquatic toxicity, enabling it to accurately differentiate between these pairs.
Visualisation of representations learned by ADFC‐ATP for four structurally similar pairs with opposite properties. The molecule pair selected from RT, FHM, SHM, BS is displayed in (a–d), respectively.
In summary, ADFC‐ATP proposed in this study not only leverages dual‐view data augmentation to effectively enhance the diversity of training data and the comprehensiveness of molecular representations, but also integrates molecular fingerprints extracted by AttentiveFP with molecular graph features generated by a fine‐tuned GAT. This greatly improves the model's sensitivity and ability to capture subtle structural differences. In toxicity prediction tasks, ADFC‐ATP can precisely distinguish between structurally similar molecules with completely opposite toxicities, and the attention mechanism provides clear interpretability, directly highlighting the key functional groups and substituents that influence molecular properties. These results demonstrate that molecular representation methods based on dual‐view augmentation and feature fusion significantly improve the accuracy of aquatic toxicity prediction and hold great potential for applications in drug discovery and molecular design.
Discussion and Conclusion
4
In this study, we employ a self‐supervised molecular graph pretraining framework named ADFC‐ATP, which is based on graph contrastive learning. By incorporating structure‐perturbation augmentation and contrastive loss, our approach enhances the representation capability of molecular graphs. For downstream tasks, we use datasets from four different fish species, with each species corresponding to an independent task. Experimental results demonstrate that the proposed model can jointly predict aquatic toxicity across all four tasks. By fully integrating two molecular representation methods in the downstream tasks, ADFC‐ATP effectively learns both local and global molecular features, demonstrating excellent performance in acute toxicity prediction. Furthermore, ablation studies confirm that a certain proportion of dual‐view data augmentation combined with pretraining jointly enhances the model's effectiveness.
ADFC‐ATP achieves excellent performance on downstream aquatic toxicity prediction tasks, benefiting from a comprehensive dataset collected from four fish species, thus covering diverse aquatic environments. The model distinguishes itself through several key innovations. First, ADFC‐ATP employs advanced data augmentation strategies, including scaffold‐preserving edge reconnection and functional group substitution, to simulate realistic molecular variations. This not only enriches the training data but also improves the model's robustness to structural diversity commonly seen in environmental samples. Second, ADFC‐ATP incorporates a dual‐view feature fusion module, integrating both graph‐based neural fingerprints (such as AttentiveFP and GAT‐based embeddings), thereby capturing complementary molecular information from multiple representation spaces. This fusion approach enhances the model's ability to learn complex structure–activity relationships that are often missed by single‐representation models. Third, the use of a self‐supervised pretraining framework based on GAT enables ADFC‐ATP to leverage large‐scale unlabelled molecular data, acquiring generalised molecular knowledge prior to fine‐tuning on downstream tasks. The attention mechanism within GAT allows the model to adaptively focus on the most informative atoms and bonds, further boosting interpretability and predictive accuracy. Moreover, ADFC‐ATP is designed with strong transferability and adaptability, as demonstrated by its robust performance across multiple datasets and its effective handling of domain shifts caused by varying sample distributions among different fish species. Ablation studies confirm that both the data augmentation ratio and the integration of pretraining modules contribute significantly to the observed performance gains. In addition, ADFC‐ATP provides model interpretation capabilities by outputting atom‐level and substructure‐level attention weights, allowing researchers to identify key functional groups or fragments contributing to toxicity.
However, ADFC‐ATP still faces several limitations and challenges. First, while the combination of multi‐view data augmentation and pretraining methods effectively increases data diversity and model robustness, it cannot completely overcome the problems brought by limited or highly imbalanced datasets. When the downstream dataset is small or suffers from severe class imbalance, the model's accuracy and stability may still be adversely affected. This means that the performance of ADFC‐ATP heavily depends on the size, quality and class balance of the dataset used for fine‐tuning. Second, the choice of hyperparameters during fine‐tuning significantly impacts the final performance. Finding the optimal parameter configuration is often a time‐consuming process and may require extensive experiments. Due to the complexity of the model structure and training pipeline, ADFC‐ATP requires longer training times and substantial computational resources. Last, although ADFC‐ATP has demonstrated strong performance in aquatic toxicity prediction, its transferability and effectiveness in other molecular property prediction domains remain to be explored [38]. Despite these challenges, ADFC‐ATP shows broad application prospects in chemical design and risk assessment. Our model can help researchers efficiently evaluate aquatic toxicity without relying on animal testing, thus significantly reducing experimental cost and time. By deeply learning the internal structural relationships of molecules, ADFC‐ATP enhances our understanding of the connections between molecular structure and functional properties. These advantages stem from its innovative architectural design, which endows the model with adaptability and scalability. We believe that ADFC‐ATP will play an important role in future aquatic ecological risk assessments and related fields.
Author Contributions
Jixuan Jia: data curation, investigation, methodology, software, writing – original draft. Xin Yang: data curation, investigation, methodology, software. Ying Fang: visualisation, validation. Honghong Su: conceptualisation, methodology, project administration, supervision, writing – review and editing. Qi Zhao: conceptualisation, funding acquisition, methodology, project administration, supervision, writing – original draft, writing – review and editing.
Funding
This study is supported by the Science and Technology Plan Project of Liaoning Province (2025‐MSLH‐351), Fundamental Research Funds for the Liaoning Universities (LJ212410146026).
Conflicts of Interest
The authors declare no conflicts of interest.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1D. Wang , X. Leng , Y. Tian , J. Liu , J. Zou , and S. Xie , “Toxic Effects of Koumine on the Early‐Life Development Stage of Zebrafish,” Toxics 11 (2023): 853.37888703 10.3390/toxics 11100853 PMC 10611223 · doi ↗ · pubmed ↗
- 2C. N. Lowe , N. Charest , C. Ramsland , D. T. Chang , T. M. Martin , and A. J. Williams , “Transparency in Modeling Through Careful Application of OECD'S QSAR/QSPR Principles via a Curated Water Solubility Data Set,” Chemical Research in Toxicology 36 (2023): 465–478.36877669 10.1021/acs.chemrestox.2c 00379 PMC 10357388 · doi ↗ · pubmed ↗
- 3P. J. Landrigan , H. Raps , M. Cropper , et al., “The Minderoo‐Monaco Commission on Plastics and Human Health,” Annals of Global Health 89 (2023): 23.36969097 10.5334/aogh.4056 PMC 10038118 · doi ↗ · pubmed ↗
- 4K. Grabicova , R. Grabic , G. Fedorova , et al., “Bioaccumulation of Psychoactive Pharmaceuticals in Fish in an Effluent Dominated Stream,” Water Research 124 (2017): 654–662.28825984 10.1016/j.watres.2017.08.018 · doi ↗ · pubmed ↗
- 5L. Kullmann , F. Habedank , B. Kullmann , et al., “Evaluation of the Bioaccumulation Potential of Alizarin Red S in Fish Muscle Tissue Using the European Eel as a Model,” Analytical and Bioanalytical Chemistry 412 (2020): 1181–1192.31900528 10.1007/s 00216-019-02346-4 · doi ↗ · pubmed ↗
- 6S. Y. Seo , Y. H. Park , S. K. Jung , and J. Kim , “Acute Toxicity Evaluation of the Disinfectant Containing Percarbonate and Tetraacetylethylenediamine by Measuring Behavioral Responses of Small Fish Using Image Analysis,” Biotechnology and Bioprocess Engineering 27 (2022): 687–696.35730032 10.1007/s 12257-021-0419-0PMC 9188641 · doi ↗ · pubmed ↗
- 7S. E. Belanger , A. D. Lillicrap , S. J. Moe , R. Wolf , K. Connors , and M. R. Embry , “Weight of Evidence Tools in the Prediction of Acute Fish Toxicity,” Integrated Environmental Assessment and Management 19 (2023): 1220–1234.35049115 10.1002/ieam.4581 · doi ↗ · pubmed ↗
- 8S. Schmidt , M. Schindler , D. Faber , and J. Hager , “Fish Early Life Stage Toxicity Prediction From Acute Daphnid Toxicity and Quantum Chemistry,” SAR and QSAR in Environmental Research 32 (2021): 151–174.33525942 10.1080/1062936 X.2021.1874514 · doi ↗ · pubmed ↗
