MolX: Enhancing Large Language Models for Molecular Understanding With A Multi-Modal Extension
Khiem Le, Zhichun Guo, Kaiwen Dong, Xiaobao Huang, Bozhao Nan, Roshni Iyer, Xiangliang Zhang, Olaf Wiest, Wei Wang, Ting Hua, Nitesh V. Chawla

TL;DR
MolX enhances large language models for molecular understanding by integrating multi-modal features from molecular graphs and fingerprints, significantly improving performance on molecule-related tasks with minimal additional training parameters.
Contribution
This work introduces MolX, a multi-modal extension that enables LLMs to better understand molecules by incorporating graph and fingerprint features, a novel approach in chemical domain modeling.
Findings
Outperforms baseline models on molecular tasks
Requires only 0.53% to 0.82% additional trainable parameters
Effective in both fine-tuned and zero-shot settings
Abstract
Large Language Models (LLMs) with their strong task-handling capabilities have shown remarkable advancements across a spectrum of fields, moving beyond natural language understanding. However, their proficiency within the chemistry domain remains restricted, especially in solving molecule-related tasks. This challenge is attributed to their inherent limitations in comprehending molecules using only common textual representations, i.e. SMILES strings. In this study, we seek to enhance the ability of LLMs to comprehend molecules by equipping them with a multi-modal external module, termed MolX. Instead of directly using SMILES strings to represent a molecule, we utilize specific encoders to extract fine-grained features from both SMILES string and 2D molecular graph representations for feeding into an LLM. A hand-crafted molecular fingerprint is incorporated to leverage its embedded…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Topic Modeling
MethodsSparse Evolutionary Training
