FORGE: Fragment-Oriented Ranking and Generation for Context-Aware Molecular Optimization

Qingchuan Zhang; He Cao; Hao Li; Yanjun Shao; Zhiyuan Liu; Shihang Wang; Shufang Xie; Shenghua Gao; Xinwu Ye

arXiv:2605.10230·cs.LG·May 12, 2026

FORGE: Fragment-Oriented Ranking and Generation for Context-Aware Molecular Optimization

Qingchuan Zhang, He Cao, Hao Li, Yanjun Shao, Zhiyuan Liu, Shihang Wang, Shufang Xie, Shenghua Gao, Xinwu Ye

PDF

TL;DR

FORGE is a two-stage, fragment-oriented framework for context-aware molecular optimization that outperforms prior methods by using explicit fragment supervision instead of natural language prompts.

Contribution

It introduces a scalable, fragment-level supervision approach with a compact language model, improving molecular optimization without relying on natural language data.

Findings

01

FORGE outperforms prior methods on multiple benchmarks.

02

Explicit fragment supervision reduces hallucinations compared to language models.

03

The framework adapts to unseen objectives via in-context learning.

Abstract

Molecular optimization seeks to improve a molecule through small structural edits while preserving similarity to the starting compound. Recent language-model approaches typically treat this task as prompt-conditioned sequence generation. However, relying on natural language introduces an inherent data-scaling bottleneck, often leads to chemical hallucinations, and ignores the strong context dependence of fragment effects. We present FORGE, a two-stage framework that reformulates molecular optimization as context-aware local editing. By utilizing automatically mined, verified low-to-high edit pairs instead of expensive human text annotations, Stage 1 ranks candidate fragments by their property contribution under the full molecular context to inject chemical prior, and Stage 2 generates explicit fragment replacements. Built on a compact 0.6B language model, FORGE further adapts to unseen…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.