Stacked ensemble\-based mutagenicity prediction model using multiple modalities with graph attention network
Tanya Liyaqat, Tanvir Ahmad, Mohammad Kashif, Chandni Saxena

TL;DR
This paper presents a novel stacked ensemble model that combines multiple molecular data modalities and graph attention networks to improve mutagenicity prediction accuracy, outperforming state-of-the-art methods on standard datasets.
Contribution
Introduces a multi-modal, stacked ensemble mutagenicity prediction model integrating SMILES and molecular graph data with explainability features.
Findings
Achieves 95.21% AUC on Hansen dataset
Outperforms existing SOTA methods
Utilizes SHAP for feature importance analysis
Abstract
Mutagenicity is a concern due to its association with genetic mutations which can result in a variety of negative consequences, including the development of cancer. Earlier identification of mutagenic compounds in the drug development process is therefore crucial for preventing the progression of unsafe candidates and reducing development costs. While computational techniques, especially machine learning models have become increasingly prevalent for this endpoint, they rely on a single modality. In this work, we introduce a novel stacked ensemble based mutagenicity prediction model which incorporate multiple modalities such as simplified molecular input line entry system (SMILES) and molecular graph. These modalities capture diverse information about molecules such as substructural, physicochemical, geometrical and topological. To derive substructural, geometrical and physicochemical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science
MethodsSoftmax · Attention Is All You Need · Shapley Additive Explanations
