Chem3DLLM: 3D Multimodal Large Language Models for Chemistry

Lei Jiang; Shuzhou Sun; Biqing Qi; Yuchen Fu; Xiaohua Xu; Yuqiang Li; Dongzhan Zhou; Tianfan Fu

arXiv:2508.10696·cs.CE·August 15, 2025

Chem3DLLM: 3D Multimodal Large Language Models for Chemistry

Lei Jiang, Shuzhou Sun, Biqing Qi, Yuchen Fu, Xiaohua Xu, Yuqiang Li, Dongzhan Zhou, Tianfan Fu

PDF

TL;DR

Chem3DLLM introduces a novel 3D multimodal language model for chemistry that effectively integrates molecular geometry and protein data, enabling advanced drug design with improved structural validity and performance.

Contribution

It presents a unified LLM architecture with reversible 3D structure encoding and reinforcement learning optimization for the first time in chemistry modeling.

Findings

01

Achieved state-of-the-art Vina score of -7.21 in drug design.

02

Successfully integrated 3D molecular structures with protein data in a single model.

03

Demonstrated improved chemical validity and structural accuracy.

Abstract

In the real world, a molecule is a 3D geometric structure. Compared to 1D SMILES sequences and 2D molecular graphs, 3D molecules represent the most informative molecular modality. Despite the rapid progress of autoregressive-based language models, they cannot handle the generation of 3D molecular conformation due to several challenges: 1) 3D molecular structures are incompatible with LLMs' discrete token space, 2) integrating heterogeneous inputs like proteins, ligands, and text remains difficult within a unified model, and 3) LLMs lack essential scientific priors, hindering the enforcement of physical and chemical constraints during generation. To tackle these issues, we present Chem3DLLM, a unified protein-conditioned multimodal large language model. Our approach designs a novel reversible text encoding for 3D molecular structures using run-length compression, achieving 3x size…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.