Pharmacology Knowledge Graphs: Do We Need Chemical Structure for Drug Repurposing?
Youssef Abo-Dahab, Ruby Hernandez, Ismael Caleb Arechiga Duran

TL;DR
This study evaluates the impact of model complexity, data volume, and feature modalities on knowledge graph-based drug repurposing, demonstrating that explicit chemical structures may not be necessary for accurate predictions.
Contribution
It provides a rigorous temporal validation framework and shows that chemical structure features can be omitted without degrading performance in drug repurposing models.
Findings
Removing chemical structure encoders improved performance and reduced VRAM usage.
Increasing model size beyond 2.44 million parameters yields diminishing returns.
External validation confirmed 6 novel predictions as established therapeutic indications.
Abstract
The contributions of model complexity, data volume, and feature modalities to knowledge graph-based drug repurposing remain poorly quantified under rigorous temporal validation. We constructed a pharmacology knowledge graph from ChEMBL 36 comprising 5,348 entities including 3,127 drugs, 1,156 proteins, and 1,065 indications. A strict temporal split was enforced with training data up to 2022 and testing data from 2023 to 2025, together with biologically verified hard negatives mined from failed assays and clinical trials. We benchmarked five knowledge graph embedding models and a standard graph neural network with 3.44 million parameters that incorporates drug chemical structure using a graph attention encoder and ESM-2 protein embeddings. Scaling experiments ranging from 0.78 to 9.75 million parameters and from 25 to 100 percent of the data, together with feature ablation studies, were…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Advanced Graph Neural Networks · Bioinformatics and Genomic Networks
