EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq   Generation

Tao Ge; Si-Qing Chen; Furu Wei

arXiv:2202.07959·cs.CL·January 2, 2023

EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation

Tao Ge, Si-Qing Chen, Furu Wei

PDF

Open Access 1 Repo

TL;DR

EdgeFormer is a novel, parameter-efficient Transformer designed for on-device sequence-to-sequence generation, outperforming previous models under strict resource constraints and enabling practical on-device NLP applications.

Contribution

It introduces two new principles for cost-effective parameterization and a layer adaptation technique, advancing on-device Transformer performance with shared layers.

Findings

01

Outperforms previous parameter-efficient Transformers

02

Achieves competitive results under resource constraints

03

First publicly available pretrained on-device seq2seq model

Abstract

We introduce EdgeFormer -- a parameter-efficient Transformer for on-device seq2seq generation under the strict computation and memory constraints. Compared with the previous parameter-efficient Transformers, EdgeFormer applies two novel principles for cost-effective parameterization, allowing it to perform better given the same parameter budget; moreover, EdgeFormer is further enhanced by layer adaptation innovation that is proposed for improving the network with shared layers. Extensive experiments show EdgeFormer can effectively outperform previous parameter-efficient Transformer baselines and achieve competitive results under both the computation and memory constraints. Given the promising results, we release EdgeLM -- the pretrained version of EdgeFormer, which is the first publicly available pretrained on-device seq2seq model that can be easily fine-tuned for seq2seq tasks with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/unilm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced biosensing and bioanalysis techniques · Biosensors and Analytical Detection · Electrowetting and Microfluidic Technologies

MethodsAttention Is All You Need · Linear Layer · Tanh Activation · Sigmoid Activation · Softmax · Multi-Head Attention · Byte Pair Encoding · Dense Connections · Long Short-Term Memory · Absolute Position Encodings