Natural Language Processing Methods for the Study of Protein-Ligand   Interactions

James Michels; Ramya Bandarupalli; Amin Ahangar Akbari; Thai Le; Hong; Xiao; Jing Li; and Erik F. Y. Hom

arXiv:2409.13057·q-bio.QM·October 18, 2024

Natural Language Processing Methods for the Study of Protein-Ligand Interactions

James Michels, Ramya Bandarupalli, Amin Ahangar Akbari, Thai Le, Hong, Xiao, Jing Li, and Erik F. Y. Hom

PDF

TL;DR

This review explores how NLP techniques like transformers and attention mechanisms are applied to predict protein-ligand interactions, highlighting recent advances, limitations, and future challenges in this interdisciplinary field.

Contribution

It provides a comprehensive overview of NLP methods used in PLI prediction, emphasizing recent developments and identifying key challenges for future research.

Findings

01

NLP approaches have improved PLI prediction accuracy.

02

Transformers and attention mechanisms are central to recent advances.

03

Current limitations include data scarcity and model interpretability.

Abstract

Recent advances in Natural Language Processing (NLP) have ignited interest in developing effective methods for predicting protein-ligand interactions (PLIs) given their relevance to drug discovery and protein engineering efforts and the ever-growing volume of biochemical sequence and structural data available. The parallels between human languages and the "languages" used to represent proteins and ligands have enabled the use of NLP machine learning approaches to advance PLI studies. In this review, we explain where and how such approaches have been applied in the recent literature and discuss useful mechanisms such as long short-term memory, transformers, and attention. We conclude with a discussion of the current limitations of NLP methods for the study of PLIs as well as key challenges that need to be addressed in future work.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.