SMiCRM: A Benchmark Dataset of Mechanistic Molecular Images
Ching Ting Leung, Yufan Chen, Hanyu Gao

TL;DR
SMiCRM is a new benchmark dataset of 453 annotated chemical reaction images with mechanistic arrows, designed to improve machine recognition of molecular structures and electron flow in chemical images.
Contribution
The paper introduces SMiCRM, a novel dataset with arrow-pushing annotations for benchmarking chemical image recognition systems, especially in mechanistic contexts.
Findings
Provides a challenging dataset for OCSR systems.
Enables benchmarking of mechanistic electron flow recognition.
Facilitates development of more accurate chemical image recognition tools.
Abstract
Optical chemical structure recognition (OCSR) systems aim to extract the molecular structure information, usually in the form of molecular graph or SMILES, from images of chemical molecules. While many tools have been developed for this purpose, challenges still exist due to different types of noises that might exist in the images. Specifically, we focus on the 'arrow-pushing' diagrams, a typical type of chemical images to demonstrate electron flow in mechanistic steps. We present Structural molecular identifier of Molecular images in Chemical Reaction Mechanisms (SMiCRM), a dataset designed to benchmark machine recognition capabilities of chemical molecules with arrow-pushing annotations. Comprising 453 images, it spans a broad array of organic chemical reactions, each illustrated with molecular structures and mechanistic arrows. SMiCRM offers a rich collection of annotated molecule…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Cell Image Analysis Techniques
MethodsFocus
