Molecular Structure Extraction From Documents Using Deep Learning
Joshua Staker, Kyle Marshall, Robert Abel, Carolyn McQuaw

TL;DR
This paper introduces an end-to-end deep learning method for extracting and predicting molecular structures from document images, overcoming challenges posed by diverse styles, annotations, and image quality variations.
Contribution
The paper presents a novel deep learning approach that eliminates the need for handcrafted features in molecular structure extraction from documents.
Findings
Effective on low-resolution images
Robust against style and quality variations
Improves recognition rates over traditional methods
Abstract
Chemical structure extraction from documents remains a hard problem due to both false positive identification of structures during segmentation and errors in the predicted structures. Current approaches rely on handcrafted rules and subroutines that perform reasonably well generally, but still routinely encounter situations where recognition rates are not yet satisfactory and systematic improvement is challenging. Complications impacting performance of current approaches include the diversity in visual styles used by various software to render structures, the frequent use of ad hoc annotations, and other challenges related to image quality, including resolution and noise. We here present end-to-end deep learning solutions for both segmenting molecular structures from documents and for predicting chemical structures from these segmented images. This deep learning-based approach does not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
