Vision Language Model is NOT All You Need: Augmentation Strategies for   Molecule Language Models

Namkyeong Lee; Siddhartha Laghuvarapu; Chanyoung Park; Jimeng Sun

arXiv:2407.09043·cs.AI·July 24, 2024·1 cites

Vision Language Model is NOT All You Need: Augmentation Strategies for Molecule Language Models

Namkyeong Lee, Siddhartha Laghuvarapu, Chanyoung Park, Jimeng Sun

PDF

Open Access 1 Repo

TL;DR

This paper introduces AMOLE, a novel augmentation strategy for molecule language models that leverages structural similarity and expertise transfer to improve understanding of molecules and their descriptions, addressing data scarcity and expertise gaps.

Contribution

AMOLE is the first method to combine structural similarity preserving loss and expertise transfer for enhancing molecule language models.

Findings

01

AMOLE outperforms existing models on various downstream tasks.

02

Structural similarity preservation improves molecule-text understanding.

03

Expertise transfer enhances model performance on less studied molecules.

Abstract

Recently, there has been a growing interest among researchers in understanding molecules and their textual descriptions through molecule language models (MoLM). However, despite some early promising developments, the advancement of MoLM still trails significantly behind that of vision language models (VLM). This is because unique challenges exist apart from VLM in the field of MoLM due to 1) a limited amount of molecule-text paired data and 2) missing expertise that occurred due to the specialized areas of focus among the experts. To this end, we propose AMOLE, which 1) augments molecule-text pairs with structural similarity preserving loss, and 2) transfers the expertise between the molecules. Specifically, AMOLE enriches molecule-text pairs by sharing descriptions among structurally similar molecules with a novel structural similarity preserving loss. Moreover, we propose an expertise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

namkyeong/amole
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems

MethodsFocus