Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding
Zihao Jing, Qiuhao Zeng, Ruiyi Fang, Yan Sun, Boyu Wang, Pingzhao Hu

TL;DR
EDT-Former introduces entropy-guided dynamic tokens for molecular graph understanding, enabling efficient alignment with LLMs without extensive tuning, and achieves state-of-the-art results in molecular understanding tasks.
Contribution
It proposes a novel entropy-guided dynamic token transformer that aligns molecular graph features with LLMs efficiently without tuning the entire LLM backbone.
Findings
Achieves state-of-the-art results on multiple molecular benchmarks.
Enables efficient LLM alignment without full model fine-tuning.
Preserves local and global molecular structural features.
Abstract
Molecular understanding is central to advancing areas such as scientific discovery, yet Large Language Models (LLMs) struggle to understand molecular graphs effectively. Existing graph-LLM bridges often adapt the Q-Former-style connector with fixed-length static tokens, which is originally designed for vision tasks. These designs overlook stereochemistry and substructural context and typically require costly LLM-backbone fine-tuning, limiting efficiency and generalization. We introduce EDT-Former, an Entropy-guided Dynamic Token Transformer that generates tokens aligned with informative molecular patches, thereby preserving both local and global structural features for molecular graph understanding. Beyond prior approaches, EDT-Former enables alignment between frozen graph encoders and LLMs without tuning the LLM backbone (excluding the embedding layer), resulting in computationally…
Peer Reviews
Decision·ICLR 2026 Poster
- Good motivation - This work is well-motivated by the need for a dynamic length graph token that captures molecular substructure information. - Methodological novelty - The proposed tokenization is novel, based on entropy-guided segmentation for molecules based on uncertainty peaks from a next-atom predictor, offering a data-driven and deterministic patching mechanism. This design appears to be suitable for learning the representation of molecular functional groups, considering the dyna
Overall, I find the proposed method interesting and reasonable (have a different opinion regarding NAP, though); my concerns primarily relate to the experimental setting and the demonstration of the author's hypothesis. I hope these are properly addressed in the rebuttal phase. - Ambiguity in the Description of Experimental Settings and Results - (Major) In line 312, they mention evaluating with Direct, Reasoning, and Rich Instructions prompting to reduce prompt sensitivity, but there is no
1. This work targets on fixed-token bottleneck in graph-LLM alignment, which is timely and critical; 2. The proposed approach is novel and interesting; 3. There are significant empirical improvements;
1. The benchmarked tasks seem to be limited. For example, can this approach be applied to other tasks in Mol-Instructions? 2. The empirical comparison seems not to be fair, as EDT-Former uses different training corpus with other baseline approaches. Given the efficiency of the proposed approach, can EDT-Former be applied and ablated with different instruction training data? 3. Lack of comparison and discussion with a closely related work [1]. For example, can the proposed tokenization scheme m
- The paper is well-written and easy to follow. - The paper proposes a novel query Transformer. Most works simply abstract molecules into queries, not considering the stereochemistry and structural context in the molecule. Another line of work exploits rule-based algorithms to extract the meaningful substructures. Different from them, the proposed work automatically extracts substructures by applying entropy-based patching segments. - Experimental results demonstrate the effectiveness of EDT-For
- The entire entropy-patching mechanism is based on a 1D SMILES sequence. The properties of a SMILES string (and thus its entropy profile) can change based on the canonicalization algorithm used or if non-canonical strings are permitted. The paper does not discuss the robustness of the patching mechanism to different, yet chemically equivalent, SMILES representations of the same molecule.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Machine Learning in Materials Science · Computational Drug Discovery Methods
