Compound virtual screening by learning-to-rank with gradient boosting decision tree and enrichment-based cumulative gain
Kairi Furui, Masahito Ohue

TL;DR
This paper demonstrates that gradient boosting decision trees with learning-to-rank methods outperform traditional models in compound virtual screening, and introduces a new metric, NEDCG, for better evaluation of ranking quality.
Contribution
It compares GBDT-based learning-to-rank methods with RankSVM and regression models, and proposes NEDCG for improved assessment of compound screening performance.
Findings
GBDT with learning-to-rank outperforms RankSVM and regression models.
NEDCG effectively distinguishes worse-than-random predictions.
Proposed NEDCG provides a more accurate evaluation metric for compound ranking.
Abstract
Learning-to-rank, a machine learning technique widely used in information retrieval, has recently been applied to the problem of ligand-based virtual screening, to accelerate the early stages of new drug development. Ranking prediction models learn based on ordinal relationships, making them suitable for integrating assay data from various environments. Existing studies of rank prediction in compound screening have generally used a learning-to-rank method called RankSVM. However, they have not been compared with or validated against the gradient boosting decision tree (GBDT)-based learning-to-rank methods that have gained popularity recently. Furthermore, although the ranking metric called Normalized Discounted Cumulative Gain (NDCG) is widely used in information retrieval, it only determines whether the predictions are better than those of other models. In other words, NDCG is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
