COMO: Closed-Loop Optical Molecule Recognition with Minimum Risk Training

Zhuoqi Lyu; Qing Ke

arXiv:2604.23546·cs.CV·April 28, 2026

COMO: Closed-Loop Optical Molecule Recognition with Minimum Risk Training

Zhuoqi Lyu, Qing Ke

PDF

1 Models

TL;DR

COMO introduces a novel closed-loop framework with Minimum Risk Training for optical chemical structure recognition, effectively optimizing molecule-level criteria and outperforming existing methods on multiple benchmarks.

Contribution

The paper proposes COMO, a closed-loop OCSR system using MRT to directly optimize molecule-level objectives, addressing exposure bias and improving recognition accuracy.

Findings

01

COMO outperforms existing methods on ten benchmarks.

02

MRT is architecture-agnostic and enhances model training.

03

The approach requires less training data than previous methods.

Abstract

Optical chemical structure recognition (OCSR) translates molecular images into machine-readable representations like SMILES strings or molecular graphs, but remains challenging in real-world documents due to inexhaustible variations in chemical structures, shorthand conventions, and visual noise. Most existing deep-learning-based approaches rely on teacher forcing with token-level Maximum Likelihood Estimation (MLE). This training paradigm suffers from exposure bias, as models are trained under ground-truth prefixes but must condition on their own previous predictions during inference. Moreover, token-level MLE objectives hinder the optimization towards molecular-level evaluation criteria such as chemical validity and structural similarity. Here we introduce Minimum Risk Training (MRT) to OCSR and propose COMO (Closed-loop Optical Molecule recOgnition), a closed-loop framework that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Keylab/COMO
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.