Bridging the Gap Between Molecule and Textual Descriptions via Substructure-aware Alignment

Hyuntae Park; Yeachan Kim; SangKeun Lee

arXiv:2510.26157·cs.LG·October 31, 2025

Bridging the Gap Between Molecule and Textual Descriptions via Substructure-aware Alignment

Hyuntae Park, Yeachan Kim, SangKeun Lee

PDF

3 Models 1 Video

TL;DR

MolBridge is a new framework that improves molecule-text understanding by learning fine-grained alignments between molecular substructures and chemical descriptions, leading to better performance on molecular benchmarks.

Contribution

It introduces substructure-aware contrastive learning and a self-refinement mechanism to enhance molecule-text alignment, addressing limitations of previous models.

Findings

01

Outperforms state-of-the-art baselines on molecular benchmarks.

02

Effectively captures fine-grained molecule-text correspondences.

03

Demonstrates the importance of substructure-aware alignment.

Abstract

Molecule and text representation learning has gained increasing interest due to its potential for enhancing the understanding of chemical information. However, existing models often struggle to capture subtle differences between molecules and their descriptions, as they lack the ability to learn fine-grained alignments between molecular substructures and chemical phrases. To address this limitation, we introduce MolBridge, a novel molecule-text learning framework based on substructure-aware alignments. Specifically, we augment the original molecule-description pairs with additional alignment signals derived from molecular substructures and chemical phrases. To effectively learn from these enriched alignments, MolBridge employs substructure-aware contrastive learning, coupled with a self-refinement mechanism that filters out noisy alignment signals. Experimental results show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

Bridging the Gap Between Molecule and Textual Descriptions via Substructure-aware Alignment· underline