Generalizable compound protein interaction prediction with a model incorporating protein structure aware and compound property aware language model representations
Yiming Zhang, Ryuichiro Ishitani, Mizuki Takemoto, Atsuhiro Tomita

TL;DR
The paper introduces GenSPARC, a deep learning model that improves compound-protein interaction prediction by using protein structure and compound property data, enhancing drug discovery.
Contribution
GenSPARC introduces structure-aware protein and compound property-aware representations to improve CPI prediction accuracy and generalizability.
Findings
GenSPARC demonstrates strong generalizability across challenging CPI data splits.
The model achieves competitive results in virtual screening tasks.
Structure-aware and multimodal representations enhance interaction modeling.
Abstract
Compound–protein interaction (CPI) prediction plays a crucial role in drug discovery by aiding the identification of binding and affinities between small molecules and proteins. Current deep learning models rely heavily on sequence-based representations and suffer from a lack of labeled data, which restricts their accuracy and generalizability. To overcome these challenges, we propose GenSPARC (a model with Generalized Structure- and Property-Aware Representations of protein and chemical language models for CPI prediction), a deep learning model that leverages structure-aware protein representations derived from AlphaFold2 predictions and FoldSeek’s three-dimensional interaction alphabet. Compound features were extracted using graph convolutional networks and a pretrained chemical language model, thereby ensuring comprehensive multimodal representation. An attention mechanism further…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Protein Structure and Dynamics · Bioinformatics and Genomic Networks
