Fine-Tuning ChemBERTa for Predicting Inhibitory Activity Against TDP1 Using Deep Learning
Baichuan Zeng

TL;DR
This paper introduces a deep learning model based on ChemBERTa for predicting the inhibitory activity of molecules against TDP1, demonstrating improved accuracy and screening efficiency over classical methods, thus aiding early drug discovery.
Contribution
The study presents a novel fine-tuning approach of ChemBERTa for pIC50 prediction, utilizing large-scale data and addressing activity imbalance, with comprehensive evaluation and validation.
Findings
Outperforms classical baselines in regression accuracy.
Achieves high enrichment factor EF@1% of 17.4.
Provides a robust tool for prioritizing TDP1 inhibitors.
Abstract
Predicting the inhibitory potency of small molecules against Tyrosyl-DNA Phosphodiesterase 1 (TDP1)-a key target in overcoming cancer chemoresistance-remains a critical challenge in early drug discovery. We present a deep learning framework for the quantitative regression of pIC50 values from molecular Simplified Molecular Input Line Entry System (SMILES) strings using fine-tuned variants of ChemBERTa, a pre-trained chemical language model. Leveraging a large-scale consensus dataset of 177,092 compounds, we systematically evaluate two pre-training strategies-Masked Language Modeling (MLM) and Masked Token Regression (MTR)-under stratified data splits and sample weighting to address severe activity imbalance which only 2.1% are active. Our approach outperforms classical baselines Random Predictor in both regression accuracy and virtual screening utility, and has competitive performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Phosphodiesterase function and regulation · Cell Image Analysis Techniques
