An Interpretable Framework Applying Protein Words to Predict Protein-Small Molecule Complementary Pairing Rules

Jingke Chen; Jingrui Zhong; Tazneen Hossain Tani; Zidong Su; Xiaochun Zhang; Boxue Tian

arXiv:2604.16550·cs.LG·April 21, 2026

An Interpretable Framework Applying Protein Words to Predict Protein-Small Molecule Complementary Pairing Rules

Jingke Chen, Jingrui Zhong, Tazneen Hossain Tani, Zidong Su, Xiaochun Zhang, Boxue Tian

PDF

TL;DR

The paper introduces PWRules, an interpretable framework that uses protein words and small molecule fragments to predict protein-ligand interactions, achieving competitive accuracy and broad applicability.

Contribution

PWRules is a novel interpretable method that identifies pairing rules between protein words and small molecule fragments for drug discovery.

Findings

01

PWScore achieves performance comparable to Glide and PSICHIC.

02

PWScore shows broad applicability to unseen protein targets like SARS-CoV-2.

03

Learned rules are enriched near ligand-binding pockets.

Abstract

Despite the high accuracy of 'black box' deep learning models, drug discovery still relies on protein-ligand interaction principles and heuristics. To improve interpretability of protein-small molecule binding predictions, we developed the PWRules framework, which applies binding affinity data to identify privileged small molecule fragments and subsequently defines complementary pairing rules between these fragments and protein words (semantic sequence units) through an interpretability module. The resulting word-fragment rules are then ranked by the PWScore function to prioritize active compounds. Evaluations on benchmark datasets show that PWScore achieves competitive performance comparable to the physics-based model (Glide) and the deep learning model (PSICHIC) and shows broad applicability for protein targets outside the training dataset, e.g., SARS-CoV-2 main protease. Notably,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.