TorchScale: Transformers at Scale

Shuming Ma; Hongyu Wang; Shaohan Huang; Wenhui Wang; Zewen Chi; Li; Dong; Alon Benhaim; Barun Patra; Vishrav Chaudhary; Xia Song; Furu Wei

arXiv:2211.13184·cs.LG·November 24, 2022·6 cites

TorchScale: Transformers at Scale

Shuming Ma, Hongyu Wang, Shaohan Huang, Wenhui Wang, Zewen Chi, Li, Dong, Alon Benhaim, Barun Patra, Vishrav Chaudhary, Xia Song, Furu Wei

PDF

Open Access 1 Repo

TL;DR

TorchScale is an open-source toolkit that enables efficient and effective scaling of Transformer models, incorporating various modeling techniques to improve performance, stability, and scalability across tasks like language modeling and translation.

Contribution

It introduces a comprehensive toolkit with multiple modeling techniques for scalable Transformers, addressing challenges in stability and efficiency.

Findings

01

Successful scaling of Transformers to various sizes

02

Improved training stability and efficiency

03

Enhanced modeling generality and capability

Abstract

Large Transformers have achieved state-of-the-art performance across many tasks. Most open-source libraries on scaling Transformers focus on improving training or inference with better parallelization. In this work, we present TorchScale, an open-source toolkit that allows researchers and developers to scale up Transformers efficiently and effectively. TorchScale has the implementation of several modeling techniques, which can improve modeling generality and capability, as well as training stability and efficiency. Experimental results on language modeling and neural machine translation demonstrate that TorchScale can successfully scale Transformers to different sizes without tears. The library is available at https://aka.ms/torchscale.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/torchscale
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques

MethodsLib