PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes

He Cao; Yanjun Shao; Zhiyuan Liu; Zijing Liu; Xiangru Tang; and Yuan Yao; Yu Li

arXiv:2406.13193·cs.LG·June 21, 2024

PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes

He Cao, Yanjun Shao, Zhiyuan Liu, Zijing Liu, Xiangru Tang, and Yuan Yao, Yu Li

PDF

Open Access 1 Repo 5 Datasets 1 Video

TL;DR

PRESTO is a novel framework that enhances multimodal language models for synthetic chemistry by progressively improving molecule-text understanding and multi-graph interactions, leading to better task performance.

Contribution

It introduces a progressive pretraining approach that integrates cross-modal alignment and multi-graph understanding for synthetic chemistry applications.

Findings

01

PRESTO achieves competitive results in synthetic chemistry tasks.

02

The framework effectively bridges molecule-text modality gaps.

03

Extensive experiments validate PRESTO's improvements.

Abstract

Multimodal Large Language Models (MLLMs) have seen growing adoption across various scientific disciplines. These advancements encourage the investigation of molecule-text modeling within synthetic chemistry, a field dedicated to designing and conducting chemical reactions to synthesize new compounds with desired properties and applications. Current approaches, however, often neglect the critical role of multiple molecule graph interaction in understanding chemical reactions, leading to suboptimal performance in synthetic chemistry tasks. This study introduces PRESTO(Progressive Pretraining Enhances Synthetic Chemistry Outcomes), a new framework that bridges the molecule-text modality gap by integrating a comprehensive benchmark of pretraining strategies and dataset configurations. It progressively improves multimodal LLMs through cross-modal alignment and multi-graph understanding. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

idea-xl/presto
pytorchOfficial

Datasets

Videos

PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes· underline

Taxonomy

TopicsMobile Learning in Education