Translation between Molecules and Natural Language
Carl Edwards, Tuan Lai, Kevin Ros, Garrett Honke, Kyunghyun Cho, Heng, Ji

TL;DR
MolT5 is a self-supervised framework that enables translation between molecules and natural language, facilitating tasks like molecule captioning and de novo molecule generation, addressing data scarcity in chemistry.
Contribution
Introduces MolT5, a novel self-supervised model for molecule-language translation, pioneering new cross-modal tasks in chemistry and improving generation quality.
Findings
Models produce high-quality molecule and caption outputs
Effective in molecule captioning and generation tasks
Addresses data scarcity in chemical domain
Abstract
We present a self-supervised learning framework for pretraining models on a vast amount of unlabeled natural language text and molecule strings. allows for new, useful, and challenging analogs of traditional vision-language tasks, such as molecule captioning and text-based de novo molecule generation (altogether: translation between molecules and language), which we explore for the first time. Since pretrains models on single-modal data, it helps overcome the chemistry domain shortcoming of data scarcity. Furthermore, we consider several metrics, including a new cross-modal embedding-based metric, to evaluate the tasks of molecule captioning and text-based molecule generation. Our results show that -based models are able to generate outputs, both molecules and captions, which in many cases are high quality.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗laituan245/molt5-base-caption2smilesmodel· 173 dl· ♡ 1173 dl♡ 1
- 🤗laituan245/molt5-large-caption2smilesmodel· 369 dl· ♡ 1369 dl♡ 1
- 🤗laituan245/molt5-small-smiles2captionmodel· 35 dl· ♡ 235 dl♡ 2
- 🤗laituan245/molt5-large-smiles2captionmodel· 303 dl· ♡ 4303 dl♡ 4
- 🤗laituan245/molt5-small-caption2smilesmodel· 3 dl· ♡ 23 dl♡ 2
- 🤗laituan245/molt5-base-smiles2captionmodel· 17k dl· ♡ 317k dl♡ 3
- 🤗laituan245/molt5-largemodel· 4.8k dl· ♡ 14.8k dl♡ 1
- 🤗laituan245/molt5-basemodel· 14k dl· ♡ 214k dl♡ 2
- 🤗laituan245/molt5-smallmodel· 1.3k dl· ♡ 21.3k dl♡ 2
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Topic Modeling
