Task-Specific Sparse Feature Masks for Molecular Toxicity Prediction with Chemical Language Models
Kwun Sy Lee, Jiawei Chen, Fuk Sheng Ford Chung, Tianyu Zhao, Zhenyuan Chen, Debby D. Wang

TL;DR
This paper introduces a multi-task learning framework with sparse attention modules for molecular toxicity prediction, improving accuracy and interpretability by highlighting key molecular fragments influencing safety assessments.
Contribution
It presents a novel multi-task learning architecture with sparse attention for chemical language models, enhancing both predictive performance and interpretability in toxicity prediction.
Findings
Outperforms single-task and standard MTL baselines on benchmark datasets
Provides chemically intuitive visualizations of influential molecular fragments
Achieves end-to-end training with adaptable transformer-based backbones
Abstract
Reliable in silico molecular toxicity prediction is a cornerstone of modern drug discovery, offering a scalable alternative to experimental screening. However, the black-box nature of state-of-the-art models remains a significant barrier to adoption, as high-stakes safety decisions demand verifiable structural insights alongside predictive performance. To address this, we propose a novel multi-task learning (MTL) framework designed to jointly enhance accuracy and interpretability. Our architecture integrates a shared chemical language model with task-specific attention modules. By imposing an L1 sparsity penalty on these modules, the framework is constrained to focus on a minimal set of salient molecular fragments for each distinct toxicity endpoint. The resulting framework is trained end-to-end and is readily adaptable to various transformer-based backbones. Evaluated on the ClinTox,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Machine Learning in Bioinformatics
