Exploring Low-Cost Transformer Model Compression for Large-Scale   Commercial Reply Suggestions

Vaishnavi Shrivastava; Radhika Gaonkar; Shashank Gupta; Abhishek Jha

arXiv:2111.13999·cs.CL·November 30, 2021

Exploring Low-Cost Transformer Model Compression for Large-Scale Commercial Reply Suggestions

Vaishnavi Shrivastava, Radhika Gaonkar, Shashank Gupta, Abhishek Jha

PDF

Open Access

TL;DR

This paper investigates low-cost model compression methods like Layer Dropping and Layer Freezing to reduce training time for large-scale commercial reply suggestion systems, maintaining relevance and engagement.

Contribution

It demonstrates effective training time reduction using low-cost compression techniques in large-data scenarios without sacrificing model quality.

Findings

01

Training time reduced by 42%

02

Model relevance and user engagement unaffected

03

Robustness confirmed across different datasets and model sizes

Abstract

Fine-tuning pre-trained language models improves the quality of commercial reply suggestion systems, but at the cost of unsustainable training times. Popular training time reduction approaches are resource intensive, thus we explore low-cost model compression techniques like Layer Dropping and Layer Freezing. We demonstrate the efficacy of these techniques in large-data scenarios, enabling the training time reduction for a commercial email reply suggestion system by 42%, without affecting the model relevance or user engagement. We further study the robustness of these techniques to pre-trained model and dataset size ablation, and share several insights and recommendations for commercial applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Healthcare