Unifying Molecular and Textual Representations via Multi-task Language Modelling
Dimitrios Christofidellis, Giorgio Giannone, Jannis Born, Ole Winther,, Teodoro Laino, Matteo Manica

TL;DR
This paper introduces a multi-task language model that unifies chemical and natural language representations, enabling efficient cross-domain task solving without domain-specific fine-tuning, thus advancing data-driven scientific discovery.
Contribution
It presents the first multi-domain, multi-task language model capable of handling both chemical and natural language tasks simultaneously, improving performance through shared representations.
Findings
Sharing weights across domains improves task performance.
Model outperforms state-of-the-art baselines on multiple benchmarks.
Cross-domain sharing enhances results more as scale increases.
Abstract
The recent advances in neural language models have also been successfully applied to the field of chemistry, offering generative solutions for classical problems in molecular design and synthesis planning. These new methods have the potential to fuel a new era of data-driven automation in scientific discovery. However, specialized models are still typically required for each task, leading to the need for problem-specific fine-tuning and neglecting task interrelations. The main obstacle in this field is the lack of a unified representation between natural language and chemical representations, complicating and limiting human-machine interaction. Here, we propose the first multi-domain, multi-task language model that can solve a wide range of tasks in both the chemical and natural language domains. Our model can handle chemical and natural language concurrently, without requiring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗GT4SD/multitask-text-and-chemistry-t5-small-standardmodel· 39 dl· ♡ 239 dl♡ 2
- 🤗GT4SD/multitask-text-and-chemistry-t5-small-augmmodel· 201 dl· ♡ 2201 dl♡ 2
- 🤗GT4SD/multitask-text-and-chemistry-t5-base-standardmodel· 657 dl· ♡ 6657 dl♡ 6
- 🤗GT4SD/multitask-text-and-chemistry-t5-base-augmmodel· 26k dl· ♡ 1026k dl♡ 10
Videos
Taxonomy
TopicsMachine Learning in Materials Science · Topic Modeling · Computational Drug Discovery Methods
