A large dataset curation and benchmark for drug target interaction

Alex Golts; Vadim Ratner; Yoel Shoshan; Moshe Raboh; Sagi Polaczek,; Michal Ozery-Flato; Daniel Shats; Liam Hazan; Sivan Ravid; Efrat Hexter

arXiv:2401.17174·q-bio.BM·January 31, 2024·1 cites

A large dataset curation and benchmark for drug target interaction

Alex Golts, Vadim Ratner, Yoel Shoshan, Moshe Raboh, Sagi Polaczek,, Michal Ozery-Flato, Daniel Shats, Liam Hazan, Sivan Ravid, Efrat Hexter

PDF

Open Access

TL;DR

This paper introduces a standardized large dataset for drug target interaction prediction, along with a benchmark protocol, to improve comparability and validity of computational drug discovery research.

Contribution

It presents a comprehensive data curation, standardization, and splitting strategy, along with an evaluation protocol for DTI prediction models.

Findings

01

The dataset enables consistent benchmarking across studies.

02

The benchmark protocol improves comparability of results.

03

Experimental validation confirms the dataset's usefulness.

Abstract

Bioactivity data plays a key role in drug discovery and repurposing. The resource-demanding nature of \textit{in vitro} and \textit{in vivo} experiments, as well as the recent advances in data-driven computational biochemistry research, highlight the importance of \textit{in silico} drug target interaction (DTI) prediction approaches. While numerous large public bioactivity data sources exist, research in the field could benefit from better standardization of existing data resources. At present, different research works that share similar goals are often difficult to compare properly because of different choices of data sources and train/validation/test split strategies. Additionally, many works are based on small data subsets, leading to results and insights of possible limited validity. In this paper we propose a way to standardize and represent efficiently a very large dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Cell Image Analysis Techniques · Biomedical Text Mining and Ontologies