Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language
Philipp Seidl, Andreu Vall, Sepp Hochreiter, G\"unter Klambauer

TL;DR
This paper introduces CLAMP, a novel activity prediction model that leverages natural language understanding and a modular architecture to improve zero- and few-shot learning in drug discovery tasks.
Contribution
The paper proposes a new modular architecture with a contrastive pre-training objective enabling activity prediction models to adapt to new tasks through textual understanding without additional training.
Findings
CLAMP outperforms existing models on few-shot benchmarks.
CLAMP demonstrates strong zero-shot prediction capabilities.
The modular design and pre-training improve adaptability and accuracy.
Abstract
Activity and property prediction models are the central workhorses in drug discovery and materials sciences, but currently they have to be trained or fine-tuned for new tasks. Without training or fine-tuning, scientific language models could be used for such low-data tasks through their announced zero- and few-shot capabilities. However, their predictive quality at activity prediction is lacking. In this work, we envision a novel type of activity prediction model that is able to adapt to new prediction tasks at inference time, via understanding textual information describing the task. To this end, we propose a new architecture with separate modules for chemical and natural language inputs, and a contrastive pre-training objective on data from large biochemical databases. In extensive experiments, we show that our method CLAMP yields improved predictive performance on few-shot learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science
MethodsContrastive Learning
