Circuits, Features, and Heuristics in Molecular Transformers
Kristof Varadi, Mark Marosi, Peter Antal

TL;DR
This paper investigates how autoregressive transformers generate valid chemical structures by analyzing their underlying mechanisms, revealing patterns related to syntax and chemical validity, and demonstrating how these insights improve downstream task performance.
Contribution
It provides a mechanistic analysis of molecular transformers, identifying computational patterns and feature representations that explain their ability to generate valid molecules.
Findings
Transformers exhibit low-level syntactic parsing patterns.
They capture abstract chemical validity constraints.
Insights improve downstream predictive tasks.
Abstract
Transformers generate valid and diverse chemical structures, but little is known about the mechanisms that enable these models to capture the rules of molecular representation. We present a mechanistic analysis of autoregressive transformers trained on drug-like small molecules to reveal the computational structure underlying their capabilities across multiple levels of abstraction. We identify computational patterns consistent with low-level syntactic parsing and more abstract chemical validity constraints. Using sparse autoencoders (SAEs), we extract feature dictionaries associated with chemically relevant activation patterns. We validate our findings on downstream tasks and find that mechanistic insights can translate to predictive performance in various practical settings.
Peer Reviews
Decision·Submitted to ICLR 2026
1. The paper provides a novel perspective on mechanistic interpretability of CLMs focusing on syntactic rules like: ring opening and closing or valence budgeting. 2. Although the use of SAE for interpretability in LMs is not new, the approach here proposed leveraging SAE with SMAER patterns is novel to the best of my knowledge. 3. The analysis of the SAE features is fair and the limitations are properly acknowledged, and it is interesting that there is still need for human expert selection; desp
1. Statistical robustness: the analysis of causal impact in sections 3.1 and 3.2 lack an appropriate statistical analysis. I'm not entirely sure what statistical test would be the most appropriate, but considering the number of implicit hypotheses that are being tested, I think that it is important to ensure that the results are not spurious. Similarly, Table 1 and Figure 2, should contain dispersion metrics (or true confidence intervals) calculated through different samples to show the uncertai
1. The work focuses on internal mechanisms related to SMILES syntax handling (e.g., ring closure and branch matching), which is directly relevant to the problem of generating syntactically valid molecules. This aligns with concerns in molecular generation research, where validity constraints are important. 2. The identification of a linearly readable representation related to valence capacity in the residual stream is interesting and may offer conceptual hints for improving structural consisten
1. Limited connection to practical improvements in molecular generation Although the paper reveals mechanisms for syntax and valence representation, it is not yet clear how these findings could be used to improve molecular generative performance (e.g., validity, novelty, or property-aware design). The interpretability results currently feel more diagnostic than actionable. 2. Interpretability of SAE-derived features varies considerably While some features appear to align with recognizable che
The paper addressed an important task of understanding how deep learning architectures mechanistically solve inference tasks in a way that is congruent with the scientific structure of the domain and can be assessed by subject matter experts. Application of sparse autoencoder clearly helps with feature engineering for pharmacological tasks.
The authors are unreasonably generous with the term "chemical reasoning". Neither the model that they chose nor analysis that they performed elevate to reasoning level - only good old correlations. it's not obvious what exactly one gains from the mechanistic analysis performed on SMILES. SMILES have peculiar syntax - any model that handle SMILES has to be able to deal with it. We know that transformers can deal with SMILES - what has changed in our understanding once we learned which head trac
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Advanced Graph Neural Networks
