TL;DR
This study systematically evaluates how increasing training resources for chemical language models affects their transfer performance on molecular property prediction tasks, revealing limited improvements despite better pretraining metrics.
Contribution
It challenges the assumption that larger models and datasets automatically lead to better downstream performance in chemical language modeling.
Findings
Pretraining loss decreases with more resources, but downstream performance plateaus.
Alternative metrics like Hessian-based measures do not predict downstream success.
Downstream performance can saturate or degrade even as pretraining metrics improve.
Abstract
Chemical Language Models (CLMs) pre-trained on large scale molecular data are widely used for molecular property prediction. However, the common belief that increasing training resources such as model size, dataset size, and training compute improves both pretraining loss and downstream task performance has not been systematically validated in the chemical domain. In this work, we evaluate this assumption by pretraining CLMs while scaling training resources and measuring transfer performance across diverse molecular property prediction (MPP) tasks. We find that while pretraining loss consistently decreases with increased training resources, downstream task performance shows limited improvement. Moreover, alternative metrics based on the Hessian or loss landscape also fail to estimate downstream performance in CLMs. We further identify conditions under which downstream performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗sagawa/molscaletransfer-chemlm-4.87mmodel· 42 dl42 dl
- 🤗sagawa/molscaletransfer-chemlm-0.06mmodel· 1.6k dl1.6k dl
- 🤗sagawa/molscaletransfer-chemlm-0.83mmodel· 1.6k dl1.6k dl
- 🤗sagawa/molscaletransfer-chemlm-2.30mmodel· 1.6k dl1.6k dl
- 🤗sagawa/molscaletransfer-chemlm-25.75mmodel· 21 dl21 dl
- 🤗sagawa/molscaletransfer-chemlm-86.24mmodel· 31 dl31 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
