How Much Chemistry Does a Deep Neural Network Need to Know to Make   Accurate Predictions?

Garrett B. Goh; Charles Siegel; Abhinav Vishnu; Nathan O. Hodas,; Nathan Baker

arXiv:1710.02238·stat.ML·August 17, 2018

How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions?

Garrett B. Goh, Charles Siegel, Abhinav Vishnu, Nathan O. Hodas,, Nathan Baker

PDF

2 Repos

TL;DR

This study shows that simple augmentation of molecular images with basic domain-specific information enhances deep learning predictions of chemical properties, indicating that complex chemical knowledge isn't necessary for accurate modeling.

Contribution

The paper introduces AugChemception, an improved CNN model that outperforms the original by adding minimal domain-specific information without changing architecture.

Findings

01

Augmentation with basic info improves prediction accuracy.

02

Different learning patterns are observed for toxicity/activity versus solvation energy.

03

Deep models can predict chemical properties without extensive chemical domain knowledge.

Abstract

The meteoric rise of deep learning models in computer vision research, having achieved human-level accuracy in image recognition tasks is firm evidence of the impact of representation learning of deep neural networks. In the chemistry domain, recent advances have also led to the development of similar CNN models, such as Chemception, that is trained to predict chemical properties using images of molecular drawings. In this work, we investigate the effects of systematically removing and adding localized domain-specific information to the image channels of the training data. By augmenting images with only 3 additional basic information, and without introducing any architectural changes, we demonstrate that an augmented Chemception (AugChemception) outperforms the original model in the prediction of toxicity, activity, and solvation free energy. Then, by altering the information content in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.