How Well Do Large-Scale Chemical Language Models Transfer to Downstream Tasks?

Tatsuya Sagawa; Ryosuke Kojima

arXiv:2602.11618·cs.LG·May 14, 2026

How Well Do Large-Scale Chemical Language Models Transfer to Downstream Tasks?

Tatsuya Sagawa, Ryosuke Kojima

PDF

6 Models

TL;DR

This study systematically evaluates how increasing training resources for chemical language models affects their transfer performance on molecular property prediction tasks, revealing limited improvements despite better pretraining metrics.

Contribution

It challenges the assumption that larger models and datasets automatically lead to better downstream performance in chemical language modeling.

Findings

01

Pretraining loss decreases with more resources, but downstream performance plateaus.

02

Alternative metrics like Hessian-based measures do not predict downstream success.

03

Downstream performance can saturate or degrade even as pretraining metrics improve.

Abstract

Chemical Language Models (CLMs) pre-trained on large scale molecular data are widely used for molecular property prediction. However, the common belief that increasing training resources such as model size, dataset size, and training compute improves both pretraining loss and downstream task performance has not been systematically validated in the chemical domain. In this work, we evaluate this assumption by pretraining CLMs while scaling training resources and measuring transfer performance across diverse molecular property prediction (MPP) tasks. We find that while pretraining loss consistently decreases with increased training resources, downstream task performance shows limited improvement. Moreover, alternative metrics based on the Hessian or loss landscape also fail to estimate downstream performance in CLMs. We further identify conditions under which downstream performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.