Two-Stage Pretraining for Molecular Property Prediction in the Wild
Kevin Tirta Wijaya, Minghao Guo, Michael Sun, Hans-Peter Seidel, Wojciech Matusik, Vahid Babaei

TL;DR
MoleVers is a two-stage pretrained molecular model that effectively predicts various properties in data-scarce real-world scenarios by combining unlabeled data learning with auxiliary property refinement.
Contribution
Introduces MoleVers, a novel two-stage pretraining approach with a branching encoder and dynamic noise sampling for improved molecular property prediction.
Findings
Achieves state-of-the-art results on 22 datasets.
Effective in low-label, real-world applications.
Demonstrates generalization across diverse molecular properties.
Abstract
Molecular deep learning models have achieved remarkable success in property prediction, but they often require large amounts of labeled data. The challenge is that, in real-world applications, labels are extremely scarce, as obtaining them through laboratory experimentation is both expensive and time-consuming. In this work, we introduce MoleVers, a versatile pretrained molecular model designed for various types of molecular property prediction in the wild, i.e., where experimentally-validated labels are scarce. MoleVers employs a two-stage pretraining strategy. In the first stage, it learns molecular representations from unlabeled data through masked atom prediction and extreme denoising, a novel task enabled by our newly introduced branching encoder architecture and dynamic noise scale sampling. In the second stage, the model refines these representations through predictions of…
Peer Reviews
Decision·Submitted to ICLR 2025
(1) The presentation of this paper is quite clear. The major idea is quite straightforward. It can be expected that better denoising frameworks and weakly supervised learning can be effective for enhancing few-shot molecular property prediction performance. (2) Comprehensive experiments have been covered in this work. And new large-scale benchmark dataset has been introduced for evaluation in this work.
(1) The algorithmic contribution appears somewhat limited. Firstly, both the denoising pretraining strategy and masking strategy have been explored in previous works. Therefore, this study primarily concentrates on the interaction between these two modules during the pretraining stage, which lacks novelty. In the second stage, the effectiveness of weakly supervised learning is not surprising, provided the computed coarse labels are essentially accurate. Overall, the efficacy of the proposed stra
- **Valuable Research Question**: The research question addressed in this paper focuses on an area in the field that has lacked sufficient attention and remains underexplored. The authors provide a feasible solution for effective property prediction in low-data scenarios. - **Benchmark Contribution**: The MPPW benchmark is a valuable addition to the field, as it reflects real-world scenarios where labeled molecular data is often scarce. - **Strong Empirical Results**: MoleVers achieves consist
**Weaknesses:** - My primary concern with this paper is the lack of technical novelty. Although the authors propose a two-stage pretraining approach, each pretraining stage employs widely used methods, such as MAP and dynamic denoising. Thus, in terms of technical contribution, this work may lean more toward engineering improvements. - Some experimental setups and procedures lack sufficient motivation and explanation: 1. In Section 3.2, the authors mention selecting HOMO, LUMO, and Dipole
1. The writing is clear and easy to understand. 2. Experimental settings are well-detailed, with fair comparisons based on a shared pretraining dataset, making the results convincing. 3. Addressing the challenge of limited labeled data for biological and chemical property prediction is crucial, and the two-stage approach provides an effective solution that could inspire further research in this area.
1. Given that each task in the proposed MPPW benchmark includes only 38–123 molecules, there is a risk that the results are not robust and variable. However, I did not see any mention of repeated experiments or the reporting of standard deviations. I suggest the authors perform multiple runs and report the mean and standard deviation to make the results in Table 1 more robust and convincing. 2. The authors propose a dynamic denoising pretraining strategy and a branching network design, but no ab
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDiatoms and Algae Research
