Thinking like a CHEMIST: Combined Heterogeneous Embedding Model Integrating Structure and Tokens
Nikolai Rekut, Alexey Orlov, Klea Ziu, Elizaveta Starykh, Martin Takac, Aleksandr Beznosikov

TL;DR
This paper introduces a novel molecular representation combining substructure descriptors with language and graph models, improving performance in chemical prediction tasks.
Contribution
It proposes a combined heterogeneous embedding model that integrates detailed substructure descriptors with language and graph-based models for chemistry applications.
Findings
Improved QSAR prediction accuracy.
Enhanced molecular representation capturing chemical details.
Effective integration of substructure descriptors with models.
Abstract
Representing molecular structures effectively in chemistry remains a challenging task. Language models and graph-based models are extensively utilized within this domain, consistently achieving state-of-the-art results across an array of tasks. However, the prevailing practice of representing chemical compounds in the SMILES format - used by most data sets and many language models - presents notable limitations as a training data format. In this study, we present a novel approach that decomposes molecules into substructures and computes descriptor-based representations for these fragments, providing more detailed and chemically relevant input for model training. We use this substructure and descriptor data as input for language model and also propose a bimodal architecture that integrates this language model with graph-based models. As LM we use RoBERTa, Graph Isomorphism Networks…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
The paper presents a clear methodological pipeline that bridges chemical descriptors with transformer-based and graph-based architectures, emphasizing interpretability and domain knowledge. The BRICS-based decomposition is a reasonable choice for fragment-level modeling, offering a structured alternative to SMILES-based tokenization. The contrastive learning framework for aligning substructure- and graph-level embeddings is technically well-motivated. Performance improvements on several benchmar
Despite its clarity, the paper lacks true novelty and broader experimental support. The approach essentially reuses established components (RoBERTa, GCN/GIN/Graphormer, BRICS fragmentation, and descriptor-based features) and combines them without demonstrating a clear new principle or theoretical insight. The claim of “thinking like a chemist” remains largely rhetorical—there is no evidence that the model captures reasoning-like processes, causal relations, or interpretable chemistry. Moreover,
The method demonstrates strong performance on downstream molecular property prediction tasks.
Using “Thinking Like a Chemist” in your title naturally sets the expectation that the model can reason about chemistry, make decisions, or mimic a chemist’s problem-solving process (e.g., predicting reaction outcomes, designing new molecules intelligently, or explaining chemical phenomena). Your paper is about learning molecular representations --> the title is misleading, because representation learning alone does not involve reasoning The approach of fragmenting molecules using BRICS and repre
1. The idea of using BRICS fragmentation to create a "chemical vocabulary" and then describing each "word" (substructure) with a rich set of descriptors is creative and well-motivated. The argument for aligning the model's "thinking" with a chemist's fragment-based reasoning is compelling. 2.The paper is generally well-written and clear. The figures effectively illustrate the overall architecture and key processes like tokenization and graph augmentation. The methodology is explained in a logic
1. A significant weakness is the lack of discussion and comparison with other recent multi-modal molecular models. The related work section and experiments focus on unimodal (SMILES-based LMs or GNNs) and simpler bimodal (SMILES+Graph) models. However, several advanced multi-modal frameworks have been proposed that also aim to fuse different molecular perspectives. Notably: a. MoleculeSTM (Liu et al., Nature Machine Intelligence 2023) is a multi-modal model that aligns molecular structures with
This paper is well written and easy to understand.
1. Insufficient Discussion of Related Work: The paper does not adequately situate itself within the growing field of fragment- or motif-based molecular pre-training. There is no discussion or comparison with models like FineMolTex [1], MoleculeSTM [2], and MolCA [3], which have a very similar bimodal design. This omission weakens the claim of novelty. 2. Narrow Scope of Evaluation: The experimental validation is limited to standard property prediction tasks (QSAR). For a bimodal mode
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetics, Bioinformatics, and Biomedical Research
MethodsAttention Is All You Need · Adam · Softmax · Dropout · Weight Decay · Dense Connections · Attention Dropout · Linear Layer · Layer Normalization · Residual Connection
