Molecular Structure Extraction From Documents Using Deep Learning

Joshua Staker; Kyle Marshall; Robert Abel; Carolyn McQuaw

arXiv:1802.04903·cs.LG·February 15, 2018

Molecular Structure Extraction From Documents Using Deep Learning

Joshua Staker, Kyle Marshall, Robert Abel, Carolyn McQuaw

PDF

TL;DR

This paper introduces an end-to-end deep learning method for extracting and predicting molecular structures from document images, overcoming challenges posed by diverse styles, annotations, and image quality variations.

Contribution

The paper presents a novel deep learning approach that eliminates the need for handcrafted features in molecular structure extraction from documents.

Findings

01

Effective on low-resolution images

02

Robust against style and quality variations

03

Improves recognition rates over traditional methods

Abstract

Chemical structure extraction from documents remains a hard problem due to both false positive identification of structures during segmentation and errors in the predicted structures. Current approaches rely on handcrafted rules and subroutines that perform reasonably well generally, but still routinely encounter situations where recognition rates are not yet satisfactory and systematic improvement is challenging. Complications impacting performance of current approaches include the diversity in visual styles used by various software to render structures, the frequent use of ad hoc annotations, and other challenges related to image quality, including resolution and noise. We here present end-to-end deep learning solutions for both segmenting molecular structures from documents and for predicting chemical structures from these segmented images. This deep learning-based approach does not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.