Chemical transformer compression for accelerating both training and   inference of molecular modeling

Yi Yu; Karl Borjesson

arXiv:2205.07582·cs.LG·May 17, 2022

Chemical transformer compression for accelerating both training and inference of molecular modeling

Yi Yu, Karl Borjesson

PDF

Open Access 1 Repo

TL;DR

This paper introduces DeLiCaTe, a compressed chemical transformer model that significantly accelerates training and inference in molecular modeling while maintaining high predictive performance.

Contribution

It presents a novel combination of cross-layer parameter sharing and knowledge distillation to create a lightweight transformer for molecular science.

Findings

01

DeLiCaTe achieves 4x faster training and inference.

02

It reduces parameters by 10 times and layers by 3 times.

03

Maintains comparable QSAR and virtual screening performance.

Abstract

Transformer models have been developed in molecular science with excellent performance in applications including quantitative structure-activity relationship (QSAR) and virtual screening (VS). Compared with other types of models, however, they are large, which results in a high hardware requirement to abridge time for both training and inference processes. In this work, cross-layer parameter sharing (CLPS), and knowledge distillation (KD) are used to reduce the sizes of transformers in molecular science. Both methods not only have competitive QSAR predictive performance as compared to the original BERT model, but also are more parameter efficient. Furthermore, by integrating CLPS and KD into a two-state chemical network, we introduce a new deep lite chemical transformer model, DeLiCaTe. DeLiCaTe captures general-domains as well as task-specific knowledge, which lead to a 4x faster rate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yiyudl/delicate
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Innovative Microfluidic and Catalytic Techniques Innovation

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Weight Decay · Softmax · Multi-Head Attention · Attention Dropout · Layer Normalization · Dropout · Dense Connections