Token-Mol 1.0: Tokenized drug design with large language model
Jike Wang, Rui Qin, Mingyang Wang, Meijing Fang, Yangyang Zhang,, Yuchen Zhu, Qun Su, Qiaolin Gou, Chao Shen, Odin Zhang, Zhenxing Wu, Dejun, Jiang, Xujun Zhang, Huifeng Zhao, Xiaozhe Wan, Zhourui Wu, Liwei Liu, Yu, Kang, Chang-Yu Hsieh, Tingjun Hou

TL;DR
Token-Mol is a novel token-based 3D drug design model that encodes comprehensive molecular information, enabling effective learning and prediction across various drug discovery tasks with improved accuracy and versatility.
Contribution
It introduces a token-only 3D drug design model with a new GCE loss function, enhancing LLMs' ability to handle continuous data and outperform existing methods.
Findings
Achieves comparable or superior performance on multiple drug discovery tasks.
Improves regression accuracy by approximately 30%.
Handles a wider range of downstream tasks than existing models.
Abstract
Significant interests have recently risen in leveraging sequence-based large language models (LLMs) for drug design. However, most current applications of LLMs in drug discovery lack the ability to comprehend three-dimensional (3D) structures, thereby limiting their effectiveness in tasks that explicitly involve molecular conformations. In this study, we introduced Token-Mol, a token-only 3D drug design model. This model encodes all molecular information, including 2D and 3D structures, as well as molecular property data, into tokens, which transforms classification and regression tasks in drug discovery into probabilistic prediction problems, thereby enabling learning through a unified paradigm. Token-Mol is built on the transformer decoder architecture and trained using random causal masking techniques. Additionally, we proposed the Gaussian cross-entropy (GCE) loss function to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods
