BERT Learns (and Teaches) Chemistry
Josh Payne, Mario Srouji, Dian Ang Yap, Vineet Kosaraju

TL;DR
This paper leverages a transformer-based model (BERT) to analyze molecular structures, identify important substructures, and improve predictions of chemical properties, aiding chemists in understanding and designing molecules.
Contribution
It introduces a novel application of attention mechanisms in BERT for studying chemical substructures and demonstrates their utility in property prediction and visualization.
Findings
Attention heads identify key functional groups affecting properties
BERT-based representations improve property prediction accuracy
Attention visualization aids chemists in understanding molecular features
Abstract
Modern computational organic chemistry is becoming increasingly data-driven. There remain a large number of important unsolved problems in this area such as product prediction given reactants, drug discovery, and metric-optimized molecule synthesis, but efforts to solve these problems using machine learning have also increased in recent years. In this work, we propose the use of attention to study functional groups and other property-impacting molecular substructures from a data-driven perspective, using a transformer-based model (BERT) on datasets of string representations of molecules and analyzing the behavior of its attention heads. We then apply the representations of functional groups and atoms learned by the model to tackle problems of toxicity, solubility, drug-likeness, and synthesis accessibility on smaller datasets using the learned representations as features for graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Advanced Graph Neural Networks
MethodsLinear Layer · Dense Connections · WordPiece · Residual Connection · Linear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Attention Is All You Need · Adam · Dropout
