Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model

Dongki Kim; Wonbin Lee; Sung Ju Hwang

arXiv:2502.13449·cs.LG·October 3, 2025

Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model

Dongki Kim, Wonbin Lee, Sung Ju Hwang

PDF

Open Access 4 Models 1 Datasets 1 Video

TL;DR

Mol-LLaMA is a large molecular language model designed to enhance understanding, reasoning, and explainability in molecular analysis, addressing limitations of prior models in knowledge and reasoning capabilities.

Contribution

The paper introduces Mol-LLaMA, a novel molecular language model that integrates multiple molecular encoders and specialized data types for improved molecular understanding and reasoning.

Findings

01

Mol-LLaMA effectively comprehends fundamental molecular features.

02

The model provides informative and explainable responses.

03

Experimental results show improved molecular reasoning capabilities.

Abstract

Understanding molecules is key to understanding organisms and driving advances in drug discovery, requiring interdisciplinary knowledge across chemistry and biology. Although large molecular language models have achieved notable success in task transfer, they often struggle to accurately analyze molecular features due to limited knowledge and reasoning capabilities. To address this issue, we present Mol-LLaMA, a large molecular language model that grasps the general knowledge centered on molecules and exhibits explainability and reasoning ability. To this end, we design key data types that encompass the fundamental molecular features, taking into account the essential abilities for molecular reasoning. Further, to improve molecular understanding, we propose a module that integrates complementary information from different molecular encoders, leveraging the distinct advantages of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

DongkiKim/Mol-LLaMA-Instruct
dataset· 36 dl
36 dl

Videos

Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model· slideslive

Taxonomy

TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Machine Learning in Bioinformatics