pNLP-Mixer: an Efficient all-MLP Architecture for Language

Francesco Fusco; Damian Pascual; Peter Staar; Diego Antognini

arXiv:2202.04350·cs.CL·May 26, 2023·1 cites

pNLP-Mixer: an Efficient all-MLP Architecture for Language

Francesco Fusco, Damian Pascual, Peter Staar, Diego Antognini

PDF

Open Access 1 Repo

TL;DR

The paper introduces pNLP-Mixer, a compact all-MLP architecture for on-device NLP tasks that achieves high performance with significantly fewer parameters than transformer models, making it suitable for constrained devices.

Contribution

It proposes a novel embedding-free MLP-Mixer model with a unique projection layer, enabling high weight-efficiency for on-device NLP applications.

Findings

01

Achieves 99.4% and 97.8% of mBERT performance on two datasets.

02

Uses 170x fewer parameters than mBERT.

03

Outperforms state-of-the-art tiny models by up to 7.8%.

Abstract

Large pre-trained language models based on transformer architecture have drastically changed the natural language processing (NLP) landscape. However, deploying those models for on-device applications in constrained devices such as smart watches is completely impractical due to their size and inference cost. As an alternative to transformer-based architectures, recent work on efficient NLP has shown that weight-efficient models can attain competitive performance for simple tasks, such as slot filling and intent classification, with model sizes in the order of the megabyte. This work introduces the pNLP-Mixer architecture, an embedding-free MLP-Mixer model for on-device NLP that achieves high weight-efficiency thanks to a novel projection layer. We evaluate a pNLP-Mixer model of only one megabyte in size on two multi-lingual semantic parsing datasets, MTOP and multiATIS. Our quantized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mindslab-ai/pnlp-mixer
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · mBERT · Adam · Attention Dropout · Linear Warmup With Linear Decay · WordPiece · Softmax · Weight Decay