Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation

Yu Zhang; Ruijie Yu; Kaipeng Zeng; Ding Li; Feng Zhu; Xiaokang Yang; Yaohui Jin; Yanyan Xu

arXiv:2407.15141·cs.AI·September 26, 2025·2 cites

Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation

Yu Zhang, Ruijie Yu, Kaipeng Zeng, Ding Li, Feng Zhu, Xiaokang Yang, Yaohui Jin, Yanyan Xu

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Chemma-RC, a multimodal large language model that integrates text, reaction SMILES, and graphs to improve the prediction of reaction conditions, significantly enhancing efficiency in chemical synthesis optimization.

Contribution

The paper presents a novel multimodal LLM architecture, Chemma-RC, that unifies multiple data modalities for better reaction condition prediction, outperforming existing methods.

Findings

01

Up to 17% improvement over state-of-the-art methods in condition identification.

02

High precision in predicting optimal reaction conditions.

03

Successful experimental validation on palladium-catalysed C-H arylation.

Abstract

Identifying reaction conditions that are broadly applicable across diverse substrates is a longstanding challenge in chemical and pharmaceutical research. While many methods are available to generate conditions with acceptable performance, a universal approach for reliably discovering effective conditions during reaction exploration is rare. Consequently, current reaction optimization processes are often labor-intensive, time-consuming, and costly, relying heavily on trial-and-error experimentation. Nowadays, large language models (LLMs) are capable of tackling chemistry-related problems, such as molecule design and chemical reasoning tasks. Here, we report the design, implementation and application of Chemma-RC, a text-augmented multimodal LLM to identify effective conditions through task-specific dialogue and condition generation. Chemma-RC learns a unified representation of chemical…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 4

Strengths

Chemma-RC captures the complexity of chemical reactions more comprehensively by combining multiple modalities, including SMILES, graph structures, and textual corpora. This integration may enhance the model's understanding and predictive capabilities regarding reaction conditions. Chemma-RC is trained on a dataset of 1.2 million question-and-answer pairs, significantly enhancing the model's learning efficacy and accuracy, especially in high-throughput experiments. By incorporating text-augmented

Weaknesses

Integrating multiple modalities enhances the model's abilities but also complicates it, potentially leading to longer training durations and increased debugging challenges. The model depends on extensive, high-quality multimodal data for training, which may be difficult to obtain, particularly in data-scarce environments. The complexity of the model can hinder its interpretability, making it challenging to trace the rationale behind specific predictions, a crucial aspect in chemistry. The model'

Reviewer 02Rating 5Confidence 5

Strengths

1. Large-scale training data: Chemma-RC utilizes a dataset comprising 1.2 million pairs of question-and-answer instructions during training. The substantial size of this dataset may contribute to improved learning efficacy and accuracy of the model, enhancing its performance on high-throughput experimental data. 2. Generalization capability: The paper indicates that Chemma-RC performs well on out-of-domain (OOD) and high-throughput experimentation (HTE) datasets, demonstrating its robustness an

Weaknesses

1. While the paper presents Chemma-RC as a novel multimodal model, the experimental results lack sufficient evidence that its core techniques—such as modality alignment and instruction tuning—offer a distinct advantage over simpler models like T-Rex and TextReact. To address this, I recommend that the authors include specific comparisons to highlight the unique contributions of Chemma-RC. For example, broadening the set of baseline models beyond T-Rex and TextReact could provide a more comprehen

Reviewer 03Rating 6Confidence 3

Strengths

The application of language models to chemical reaction planning represents a highly valuable direction for research. While the incorporation of additional data modalities into language models is not entirely novel, it is still an important approach that has the potential to enhance current methodologies by utilizing a broader range of available data. Moreover, the demonstrated ability of the Chemma-RC model to handle out-of-distribution data adds further significance to this work. Experimental

Weaknesses

While the paper is generally well-written, certain sections lack clarity and sufficient justification for specific decisions made in the study. Additional details on these issues can be found in the Question section.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Semantic Web and Ontologies · Service-Oriented Architecture and Web Services