Circuits, Features, and Heuristics in Molecular Transformers

Kristof Varadi; Mark Marosi; Peter Antal

arXiv:2512.09757·cs.LG·December 11, 2025

Circuits, Features, and Heuristics in Molecular Transformers

Kristof Varadi, Mark Marosi, Peter Antal

PDF

Open Access 3 Reviews

TL;DR

This paper investigates how autoregressive transformers generate valid chemical structures by analyzing their underlying mechanisms, revealing patterns related to syntax and chemical validity, and demonstrating how these insights improve downstream task performance.

Contribution

It provides a mechanistic analysis of molecular transformers, identifying computational patterns and feature representations that explain their ability to generate valid molecules.

Findings

01

Transformers exhibit low-level syntactic parsing patterns.

02

They capture abstract chemical validity constraints.

03

Insights improve downstream predictive tasks.

Abstract

Transformers generate valid and diverse chemical structures, but little is known about the mechanisms that enable these models to capture the rules of molecular representation. We present a mechanistic analysis of autoregressive transformers trained on drug-like small molecules to reveal the computational structure underlying their capabilities across multiple levels of abstraction. We identify computational patterns consistent with low-level syntactic parsing and more abstract chemical validity constraints. Using sparse autoencoders (SAEs), we extract feature dictionaries associated with chemically relevant activation patterns. We validate our findings on downstream tasks and find that mechanistic insights can translate to predictive performance in various practical settings.

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 8Confidence 4

Strengths

1. The paper provides a novel perspective on mechanistic interpretability of CLMs focusing on syntactic rules like: ring opening and closing or valence budgeting. 2. Although the use of SAE for interpretability in LMs is not new, the approach here proposed leveraging SAE with SMAER patterns is novel to the best of my knowledge. 3. The analysis of the SAE features is fair and the limitations are properly acknowledged, and it is interesting that there is still need for human expert selection; desp

Weaknesses

1. Statistical robustness: the analysis of causal impact in sections 3.1 and 3.2 lack an appropriate statistical analysis. I'm not entirely sure what statistical test would be the most appropriate, but considering the number of implicit hypotheses that are being tested, I think that it is important to ensure that the results are not spurious. Similarly, Table 1 and Figure 2, should contain dispersion metrics (or true confidence intervals) calculated through different samples to show the uncertai

Reviewer 02Rating 4Confidence 2

Strengths

1. The work focuses on internal mechanisms related to SMILES syntax handling (e.g., ring closure and branch matching), which is directly relevant to the problem of generating syntactically valid molecules. This aligns with concerns in molecular generation research, where validity constraints are important. 2. The identification of a linearly readable representation related to valence capacity in the residual stream is interesting and may offer conceptual hints for improving structural consisten

Weaknesses

1. Limited connection to practical improvements in molecular generation Although the paper reveals mechanisms for syntax and valence representation, it is not yet clear how these findings could be used to improve molecular generative performance (e.g., validity, novelty, or property-aware design). The interpretability results currently feel more diagnostic than actionable. 2. Interpretability of SAE-derived features varies considerably While some features appear to align with recognizable che

Reviewer 03Rating 4Confidence 5

Strengths

The paper addressed an important task of understanding how deep learning architectures mechanistically solve inference tasks in a way that is congruent with the scientific structure of the domain and can be assessed by subject matter experts. Application of sparse autoencoder clearly helps with feature engineering for pharmacological tasks.

Weaknesses

The authors are unreasonably generous with the term "chemical reasoning". Neither the model that they chose nor analysis that they performed elevate to reasoning level - only good old correlations. it's not obvious what exactly one gains from the mechanistic analysis performed on SMILES. SMILES have peculiar syntax - any model that handle SMILES has to be able to deal with it. We know that transformers can deal with SMILES - what has changed in our understanding once we learned which head trac

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Advanced Graph Neural Networks