Compound virtual screening by learning-to-rank with gradient boosting   decision tree and enrichment-based cumulative gain

Kairi Furui; Masahito Ohue

arXiv:2205.02169·q-bio.BM·August 30, 2022

Compound virtual screening by learning-to-rank with gradient boosting decision tree and enrichment-based cumulative gain

Kairi Furui, Masahito Ohue

PDF

TL;DR

This paper demonstrates that gradient boosting decision trees with learning-to-rank methods outperform traditional models in compound virtual screening, and introduces a new metric, NEDCG, for better evaluation of ranking quality.

Contribution

It compares GBDT-based learning-to-rank methods with RankSVM and regression models, and proposes NEDCG for improved assessment of compound screening performance.

Findings

01

GBDT with learning-to-rank outperforms RankSVM and regression models.

02

NEDCG effectively distinguishes worse-than-random predictions.

03

Proposed NEDCG provides a more accurate evaluation metric for compound ranking.

Abstract

Learning-to-rank, a machine learning technique widely used in information retrieval, has recently been applied to the problem of ligand-based virtual screening, to accelerate the early stages of new drug development. Ranking prediction models learn based on ordinal relationships, making them suitable for integrating assay data from various environments. Existing studies of rank prediction in compound screening have generally used a learning-to-rank method called RankSVM. However, they have not been compared with or validated against the gradient boosting decision tree (GBDT)-based learning-to-rank methods that have gained popularity recently. Furthermore, although the ranking metric called Normalized Discounted Cumulative Gain (NDCG) is widely used in information retrieval, it only determines whether the predictions are better than those of other models. In other words, NDCG is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.