Superior Molecular Representations from Intermediate Encoder Layers

Luis Pinto

arXiv:2506.06443·cs.LG·October 16, 2025

Superior Molecular Representations from Intermediate Encoder Layers

Luis Pinto

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that using intermediate encoder layers rather than just final layers in pretrained molecular models can significantly enhance property prediction performance and efficiency.

Contribution

It provides a comprehensive analysis of information retention across encoder layers and empirically shows the benefits of leveraging intermediate layers for molecular tasks.

Findings

01

Intermediate layers retain more general features.

02

Using frozen intermediate layers improves performance by up to 28.6%.

03

Finetuning truncated encoders achieves up to 40.8% performance gains.

Abstract

Pretrained molecular encoders have become indispensable in computational chemistry for tasks such as property prediction and molecular generation. However, the standard practice of relying solely on final-layer embeddings for downstream tasks may discard valuable information. In this work, we first analyze the information flow in five diverse molecular encoders and find that intermediate layers retain more general-purpose features, whereas the final-layer specializes and compresses information. We then perform an empirical layer-wise evaluation across 22 property prediction tasks. We find that using frozen embeddings from optimal intermediate layers improves downstream performance by an average of 5.4%, up to 28.6%, compared to the final-layer. Furthermore, finetuning encoders truncated at intermediate depths achieves even greater average improvements of 8.5%, with increases as high as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luispintoc/unlocking-chemical-insights
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicsthermodynamics and calorimetric analyses · Computational Drug Discovery Methods · Machine Learning in Materials Science