Accurate Prediction of Ligand-Protein Interaction Affinities with Fine-Tuned Small Language Models
Ben Fauber

TL;DR
This paper presents a novel approach using instruction fine-tuned small language models to accurately predict ligand-protein interaction affinities, outperforming traditional ML and FEP+ methods in zero-shot settings for drug discovery.
Contribution
It introduces a new method leveraging fine-tuned generative small language models for ligand-protein affinity prediction using only SMILES and amino acid sequences.
Findings
Outperforms ML and FEP+ methods in accuracy
Effective in zero-shot prediction scenarios
Applicable to challenging therapeutic targets
Abstract
We describe the accurate prediction of ligand-protein interaction (LPI) affinities, also known as drug-target interactions (DTI), with instruction fine-tuned pretrained generative small language models (SLMs). We achieved accurate predictions for a range of affinity values associated with ligand-protein interactions on out-of-sample data in a zero-shot setting. Only the SMILES string of the ligand and the amino acid sequence of the protein were used as the model inputs. Our results demonstrate a clear improvement over machine learning (ML) and free-energy perturbation (FEP+) based methods in accurately predicting a range of ligand-protein interaction affinities, which can be leveraged to further accelerate drug discovery campaigns against challenging therapeutic targets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Bioinformatics and Genomic Networks · Biomedical Text Mining and Ontologies
