Towards Lightweight Transformer via Group-wise Transformation for   Vision-and-Language Tasks

Gen Luo; Yiyi Zhou; Xiaoshuai Sun; Yan Wang; Liujuan Cao; Yongjian Wu,; Feiyue Huang; Rongrong Ji

arXiv:2204.07780·cs.CV·May 25, 2022

Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks

Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Yan Wang, Liujuan Cao, Yongjian Wu,, Feiyue Huang, Rongrong Ji

PDF

1 Repo

TL;DR

This paper introduces LW-Transformer, a lightweight Transformer model using Group-wise Transformation to reduce parameters and computation while maintaining performance on vision-and-language tasks.

Contribution

It proposes a universal lightweight Transformer architecture that preserves key properties of standard Transformers, applicable to various vision-and-language tasks.

Findings

01

Significant reduction in parameters and computation.

02

Competitive performance on multiple vision-and-language benchmarks.

03

Effective generalization to image classification with Swin-Transformer.

Abstract

Despite the exciting performance, Transformer is criticized for its excessive parameters and computation cost. However, compressing Transformer remains as an open problem due to its internal complexity of the layer designs, i.e., Multi-Head Attention (MHA) and Feed-Forward Network (FFN). To address this issue, we introduce Group-wise Transformation towards a universal yet lightweight Transformer for vision-and-language tasks, termed as LW-Transformer. LW-Transformer applies Group-wise Transformation to reduce both the parameters and computations of Transformer, while also preserving its two main properties, i.e., the efficient attention modeling on diverse subspaces of MHA, and the expanding-scaling feature transformation of FFN. We apply LW-Transformer to a set of Transformer-based networks, and quantitatively measure them on three vision-and-language tasks and six benchmark datasets.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luogen1996/lwtransformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Softmax · Dropout · Label Smoothing · Adam · Residual Connection · Absolute Position Encodings · Byte Pair Encoding · Position-Wise Feed-Forward Layer