EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation
Tao Ge, Si-Qing Chen, Furu Wei

TL;DR
EdgeFormer is a novel, parameter-efficient Transformer designed for on-device sequence-to-sequence generation, outperforming previous models under strict resource constraints and enabling practical on-device NLP applications.
Contribution
It introduces two new principles for cost-effective parameterization and a layer adaptation technique, advancing on-device Transformer performance with shared layers.
Findings
Outperforms previous parameter-efficient Transformers
Achieves competitive results under resource constraints
First publicly available pretrained on-device seq2seq model
Abstract
We introduce EdgeFormer -- a parameter-efficient Transformer for on-device seq2seq generation under the strict computation and memory constraints. Compared with the previous parameter-efficient Transformers, EdgeFormer applies two novel principles for cost-effective parameterization, allowing it to perform better given the same parameter budget; moreover, EdgeFormer is further enhanced by layer adaptation innovation that is proposed for improving the network with shared layers. Extensive experiments show EdgeFormer can effectively outperform previous parameter-efficient Transformer baselines and achieve competitive results under both the computation and memory constraints. Given the promising results, we release EdgeLM -- the pretrained version of EdgeFormer, which is the first publicly available pretrained on-device seq2seq model that can be easily fine-tuned for seq2seq tasks with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced biosensing and bioanalysis techniques · Biosensors and Analytical Detection · Electrowetting and Microfluidic Technologies
MethodsAttention Is All You Need · Linear Layer · Tanh Activation · Sigmoid Activation · Softmax · Multi-Head Attention · Byte Pair Encoding · Dense Connections · Long Short-Term Memory · Absolute Position Encodings
