Augmenting Molecular Images with Vector Representations as a Featurization Technique for Drug Classification
Daniel de Marchi, Amarjit Budhiraja

TL;DR
This paper introduces a novel molecular featurization technique combining images with binary vector representations, leading to improved drug classification performance and faster convergence.
Contribution
It proposes augmenting molecular images with Morgan fingerprints and MACCS keys, enhancing feature richness for deep learning models in drug classification.
Findings
Achieved state-of-the-art AUC ROC on HIV dataset
Model converged faster and required less computational power
Outperformed previous molecular featurization methods
Abstract
One of the key steps in building deep learning systems for drug classification and generation is the choice of featurization for the molecules. Previous featurization methods have included molecular images, binary strings, graphs, and SMILES strings. This paper proposes the creation of molecular images captioned with binary vectors that encode information not contained in or easily understood from a molecular image alone. Specifically, we use Morgan fingerprints, which encode higher level structural information, and MACCS keys, which encode yes or no questions about a molecules properties and structure. We tested our method on the HIV dataset published by the Pande lab, which consists of 41,127 molecules labeled by if they inhibit the HIV virus. Our final model achieved a state of the art AUC ROC on the HIV dataset, outperforming all other methods. Moreover, the model converged…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
