Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding

Zihao Jing; Qiuhao Zeng; Ruiyi Fang; Yan Sun; Boyu Wang; Pingzhao Hu

arXiv:2602.02742·cs.LG·March 3, 2026

Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding

Zihao Jing, Qiuhao Zeng, Ruiyi Fang, Yan Sun, Boyu Wang, Pingzhao Hu

PDF

Open Access 3 Reviews

TL;DR

EDT-Former introduces entropy-guided dynamic tokens for molecular graph understanding, enabling efficient alignment with LLMs without extensive tuning, and achieves state-of-the-art results in molecular understanding tasks.

Contribution

It proposes a novel entropy-guided dynamic token transformer that aligns molecular graph features with LLMs efficiently without tuning the entire LLM backbone.

Findings

01

Achieves state-of-the-art results on multiple molecular benchmarks.

02

Enables efficient LLM alignment without full model fine-tuning.

03

Preserves local and global molecular structural features.

Abstract

Molecular understanding is central to advancing areas such as scientific discovery, yet Large Language Models (LLMs) struggle to understand molecular graphs effectively. Existing graph-LLM bridges often adapt the Q-Former-style connector with fixed-length static tokens, which is originally designed for vision tasks. These designs overlook stereochemistry and substructural context and typically require costly LLM-backbone fine-tuning, limiting efficiency and generalization. We introduce EDT-Former, an Entropy-guided Dynamic Token Transformer that generates tokens aligned with informative molecular patches, thereby preserving both local and global structural features for molecular graph understanding. Beyond prior approaches, EDT-Former enables alignment between frozen graph encoders and LLMs without tuning the LLM backbone (excluding the embedding layer), resulting in computationally…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 2Confidence 4

Strengths

- Good motivation - This work is well-motivated by the need for a dynamic length graph token that captures molecular substructure information. - Methodological novelty - The proposed tokenization is novel, based on entropy-guided segmentation for molecules based on uncertainty peaks from a next-atom predictor, offering a data-driven and deterministic patching mechanism. This design appears to be suitable for learning the representation of molecular functional groups, considering the dyna

Weaknesses

Overall, I find the proposed method interesting and reasonable (have a different opinion regarding NAP, though); my concerns primarily relate to the experimental setting and the demonstration of the author's hypothesis. I hope these are properly addressed in the rebuttal phase. - Ambiguity in the Description of Experimental Settings and Results - (Major) In line 312, they mention evaluating with Direct, Reasoning, and Rich Instructions prompting to reduce prompt sensitivity, but there is no

Reviewer 02Rating 4Confidence 4

Strengths

1. This work targets on fixed-token bottleneck in graph-LLM alignment, which is timely and critical; 2. The proposed approach is novel and interesting; 3. There are significant empirical improvements;

Weaknesses

1. The benchmarked tasks seem to be limited. For example, can this approach be applied to other tasks in Mol-Instructions? 2. The empirical comparison seems not to be fair, as EDT-Former uses different training corpus with other baseline approaches. Given the efficiency of the proposed approach, can EDT-Former be applied and ablated with different instruction training data? 3. Lack of comparison and discussion with a closely related work [1]. For example, can the proposed tokenization scheme m

Reviewer 03Rating 6Confidence 5

Strengths

- The paper is well-written and easy to follow. - The paper proposes a novel query Transformer. Most works simply abstract molecules into queries, not considering the stereochemistry and structural context in the molecule. Another line of work exploits rule-based algorithms to extract the meaningful substructures. Different from them, the proposed work automatically extracts substructures by applying entropy-based patching segments. - Experimental results demonstrate the effectiveness of EDT-For

Weaknesses

- The entire entropy-patching mechanism is based on a 1D SMILES sequence. The properties of a SMILES string (and thus its entropy profile) can change based on the canonicalization algorithm used or if non-canonical strings are permitted. The paper does not discuss the robustness of the patching mechanism to different, yet chemically equivalent, SMILES representations of the same molecule.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Machine Learning in Materials Science · Computational Drug Discovery Methods