Chem3DLLM: 3D Multimodal Large Language Models for Chemistry
Lei Jiang, Shuzhou Sun, Biqing Qi, Yuchen Fu, Xiaohua Xu, Yuqiang Li, Dongzhan Zhou, Tianfan Fu

TL;DR
Chem3DLLM introduces a novel 3D multimodal language model for chemistry that effectively integrates molecular geometry and protein data, enabling advanced drug design with improved structural validity and performance.
Contribution
It presents a unified LLM architecture with reversible 3D structure encoding and reinforcement learning optimization for the first time in chemistry modeling.
Findings
Achieved state-of-the-art Vina score of -7.21 in drug design.
Successfully integrated 3D molecular structures with protein data in a single model.
Demonstrated improved chemical validity and structural accuracy.
Abstract
In the real world, a molecule is a 3D geometric structure. Compared to 1D SMILES sequences and 2D molecular graphs, 3D molecules represent the most informative molecular modality. Despite the rapid progress of autoregressive-based language models, they cannot handle the generation of 3D molecular conformation due to several challenges: 1) 3D molecular structures are incompatible with LLMs' discrete token space, 2) integrating heterogeneous inputs like proteins, ligands, and text remains difficult within a unified model, and 3) LLMs lack essential scientific priors, hindering the enforcement of physical and chemical constraints during generation. To tackle these issues, we present Chem3DLLM, a unified protein-conditioned multimodal large language model. Our approach designs a novel reversible text encoding for 3D molecular structures using run-length compression, achieving 3x size…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
