Molecule Attention Transformer
{\L}ukasz Maziarka, Tomasz Danel, S{\l}awomir Mucha, Krzysztof Rataj,, Jacek Tabor, Stanis{\l}aw Jastrz\k{e}bski

TL;DR
The Molecule Attention Transformer (MAT) is a novel neural network architecture that enhances Transformer attention with molecular structure information, achieving competitive and interpretable results across various molecular property prediction tasks.
Contribution
We introduce MAT, which incorporates inter-atomic distances and molecular graphs into Transformer attention, enabling competitive performance with minimal hyperparameter tuning.
Findings
MAT achieves state-of-the-art results with simple pretraining.
Attention weights are interpretable chemically.
MAT performs well across diverse molecular tasks.
Abstract
Designing a single neural network architecture that performs competitively across a range of molecule property prediction tasks remains largely an open challenge, and its solution may unlock a widespread use of deep learning in the drug discovery industry. To move towards this goal, we propose Molecule Attention Transformer (MAT). Our key innovation is to augment the attention mechanism in Transformer using inter-atomic distances and the molecular graph structure. Experiments show that MAT performs competitively on a diverse set of molecular prediction tasks. Most importantly, with a simple self-supervised pretraining, MAT requires tuning of only a few hyperparameter values to achieve state-of-the-art performance on downstream tasks. Finally, we show that attention weights learned by MAT are interpretable from the chemical point of view.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Protein Structure and Dynamics
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
