Supplementary Material: Implementation and Experiments for GAU-based   Model

Zhenjie Liu

arXiv:2205.05842·cs.CL·May 19, 2022·1 cites

Supplementary Material: Implementation and Experiments for GAU-based Model

Zhenjie Liu

PDF

Open Access

TL;DR

This paper analyzes the GAU-based Transformer variant FLASH, proposes a new GAU-based model, and demonstrates its superior speed and performance on the CLUE benchmark through pre-training on Chinese data.

Contribution

It introduces a novel GAU-based model, provides detailed implementation analysis, and achieves improved speed and accuracy on Chinese language tasks.

Findings

01

Model achieves 75.02 average score on CLUE benchmark.

02

Model is 45% faster than RoFormerV1.

03

Pre-trained on Chinese corpus with competitive results.

Abstract

In February this year Google proposed a new Transformer variant called FLASH, which has a faster speed, lower VRAM footprint and better performance. This is achieved by designing a performant layer named GAU (Gated Attention Unit), which combines the Attention layer and FFN. In this paper, some implementation details are re-analyzed both theoretically and practically. We then propose a novel GAU-based model and pre-train it on a Chinese corpus. Results of the CLUE benchmark show that our model achieves a dev average score of 75.02, 1% higher than RoFormerV1 and being 45% faster, which is also competitive with RoFormerV2.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFerroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing · Advanced Neural Network Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Byte Pair Encoding · Absolute Position Encodings · Residual Connection · Dense Connections · Label Smoothing · Dropout