# Most Ligand-Based Classification Benchmarks Reward Memorization Rather   than Generalization

**Authors:** Izhar Wallach, Abraham Heifets

arXiv: 1706.06619 · 2018-05-11

## TL;DR

This paper introduces AVE, a new measure of redundancy in ligand-based classification benchmarks, revealing that many reported successes may be due to overfitting rather than true predictive power.

## Contribution

The study proposes AVE as a novel metric to quantify training-validation redundancy and demonstrates its strong correlation with benchmark performance, highlighting potential overfitting issues.

## Key findings

- AVE bias correlates with benchmark performance
- Most ligand-based methods may overfit to benchmarks
- Benchmark performance may not reflect true generalization

## Abstract

Undetected overfitting can occur when there are significant redundancies between training and validation data. We describe AVE, a new measure of training-validation redundancy for ligand-based classification problems that accounts for the similarity amongst inactive molecules as well as active. We investigated seven widely-used benchmarks for virtual screening and classification, and show that the amount of AVE bias strongly correlates with the performance of ligand-based predictive methods irrespective of the predicted property, chemical fingerprint, similarity measure, or previously-applied unbiasing techniques. Therefore, it may be that the previously-reported performance of most ligand-based methods can be explained by overfitting to benchmarks rather than good prospective accuracy.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.06619/full.md

---
Source: https://tomesphere.com/paper/1706.06619