On the Unreported-Profile-is-Negative Assumption for Predictive   Cheminformatics

Chao Lan; Sai Nivedita Chandrasekaran; Jun Huan

arXiv:1704.01184·cs.LG·August 9, 2017

On the Unreported-Profile-is-Negative Assumption for Predictive Cheminformatics

Chao Lan, Sai Nivedita Chandrasekaran, Jun Huan

PDF

Open Access

TL;DR

This paper challenges the common assumption that unreported compound-target profiles are negative in cheminformatics, demonstrating that this assumption can harm predictive model performance and proposing a joint recovery and learning framework.

Contribution

It introduces a novel framework that jointly recovers unreported profiles and trains predictive models, improving accuracy over traditional assumptions.

Findings

01

Prediction performance degrades when unreported profiles are assumed negative.

02

Explicit recovery of unreported profiles enhances prediction accuracy.

03

Joint recovery and learning framework further improves model performance.

Abstract

In cheminformatics, compound-target binding profiles has been a main source of data for research. For data repositories that only provide positive profiles, a popular assumption is that unreported profiles are all negative. In this paper, we caution audience not to take this assumption for granted, and present empirical evidence of its ineffectiveness from a machine learning perspective. Our examination is based on a setting where binding profiles are used as features to train predictive models; we show (1) prediction performance degrades when the assumption fails and (2) explicit recovery of unreported profiles improves prediction performance. In particular, we propose a framework that jointly recovers profiles and learns predictive model, and show it achieves further performance improvement. The presented study not only suggests applying matrix recovery methods to recover unreported…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Computational Drug Discovery Methods · Imbalanced Data Classification Techniques