One Strike, You're Out: Detecting Markush Structures in Low Signal-to-Noise Ratio Images
Thomas Jurriaans, Kinga Szarkowska, Eric Nalisnick, Markus Schwoerer,, Camilo Thorne, Saber Akhondi

TL;DR
This paper introduces a novel CNN-based method for accurately classifying Markush chemical structures in images, significantly improving the performance of Optical Chemical Structure Recognition systems.
Contribution
The study compares fixed-feature extraction with end-to-end CNN learning, demonstrating the superior effectiveness of the CNN approach for Markush structure classification.
Findings
CNN achieved 0.928 Macro F1 score, outperforming fixed-feature methods.
End-to-end learning provides a lower bound that can be further improved.
Method enhances OCSR pipeline accuracy and reliability.
Abstract
Modern research increasingly relies on automated methods to assist researchers. An example of this is Optical Chemical Structure Recognition (OCSR), which aids chemists in retrieving information about chemicals from large amounts of documents. Markush structures are chemical structures that cannot be parsed correctly by OCSR and cause errors. The focus of this research was to propose and test a novel method for classifying Markush structures. Within this method, a comparison was made between fixed-feature extraction and end-to-end learning (CNN). The end-to-end method performed significantly better than the fixed-feature method, achieving 0.928 (0.035 SD) Macro F1 compared to the fixed-feature method's 0.701 (0.052 SD). Because of the nature of the experiment, these figures are a lower bound and can be improved further. These results suggest that Markush structures can be filtered out…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Metabolomics and Mass Spectrometry Studies · Spectroscopy and Chemometric Analyses
MethodsFocus
