Molecule Attention Transformer

{\L}ukasz Maziarka; Tomasz Danel; S{\l}awomir Mucha; Krzysztof Rataj,; Jacek Tabor; Stanis{\l}aw Jastrz\k{e}bski

arXiv:2002.08264·cs.LG·February 10, 2021·108 cites

Molecule Attention Transformer

{\L}ukasz Maziarka, Tomasz Danel, S{\l}awomir Mucha, Krzysztof Rataj,, Jacek Tabor, Stanis{\l}aw Jastrz\k{e}bski

PDF

Open Access 5 Repos

TL;DR

The Molecule Attention Transformer (MAT) is a novel neural network architecture that enhances Transformer attention with molecular structure information, achieving competitive and interpretable results across various molecular property prediction tasks.

Contribution

We introduce MAT, which incorporates inter-atomic distances and molecular graphs into Transformer attention, enabling competitive performance with minimal hyperparameter tuning.

Findings

01

MAT achieves state-of-the-art results with simple pretraining.

02

Attention weights are interpretable chemically.

03

MAT performs well across diverse molecular tasks.

Abstract

Designing a single neural network architecture that performs competitively across a range of molecule property prediction tasks remains largely an open challenge, and its solution may unlock a widespread use of deep learning in the drug discovery industry. To move towards this goal, we propose Molecule Attention Transformer (MAT). Our key innovation is to augment the attention mechanism in Transformer using inter-atomic distances and the molecular graph structure. Experiments show that MAT performs competitively on a diverse set of molecular prediction tasks. Most importantly, with a simple self-supervised pretraining, MAT requires tuning of only a few hyperparameter values to achieve state-of-the-art performance on downstream tasks. Finally, we show that attention weights learned by MAT are interpretable from the chemical point of view.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Protein Structure and Dynamics

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax