Robust Molecular Property Prediction via Densifying Scarce Labeled Data
Jina Kim, Jeffrey Willette, Bruno Andreis, Sung Ju Hwang

TL;DR
This paper introduces a bilevel optimization method that uses unlabeled data to improve molecular property prediction, especially for out-of-distribution compounds, addressing covariate shift and data scarcity issues in drug discovery.
Contribution
The novel bilevel optimization approach effectively leverages unlabeled data to enhance generalization beyond training distribution in molecular prediction models.
Findings
Significant performance improvements on real-world datasets with covariate shift.
Visual evidence from t-SNE shows effective interpolation between ID and OOD data.
Enhanced robustness of molecular property predictions in scarce data scenarios.
Abstract
A widely recognized limitation of molecular prediction models is their reliance on structures observed in the training data, resulting in poor generalization to out-of-distribution compounds. Yet in drug discovery, the compounds most critical for advancing research often lie beyond the training set, making the bias toward the training data particularly problematic. This mismatch introduces substantial covariate shift, under which standard deep learning models produce unstable and inaccurate predictions. Furthermore, the scarcity of labeled data-stemming from the onerous and costly nature of experimental validation-further exacerbates the difficulty of achieving reliable generalization. To address these limitations, we propose a novel bilevel optimization approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data, enabling the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Analytical Chemistry and Chromatography · Metabolomics and Mass Spectrometry Studies
