SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers

Ameet Deshpande; Md Arafat Sultan; Anthony Ferritto; Ashwin Kalyan,; Karthik Narasimhan; Avirup Sil

arXiv:2211.16634·cs.CL·December 1, 2022

SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers

Ameet Deshpande, Md Arafat Sultan, Anthony Ferritto, Ashwin Kalyan,, Karthik Narasimhan, Avirup Sil

PDF

Open Access 1 Repo

TL;DR

SPARTAN introduces a hierarchical sparse memory architecture for transformers that enables efficient fine-tuning on edge devices by only updating memory components, significantly reducing storage and increasing inference speed.

Contribution

It proposes a novel hierarchical sparse memory design that allows parameter-efficient fine-tuning of pre-trained language models on edge devices, outperforming existing methods in speed and comparable accuracy.

Findings

01

Over 90% inference speedup on Raspberry Pi 4

02

Outperforms PE baselines by 0.1 points on GLUE

03

Trains 34% faster in few-shot settings

Abstract

Fine-tuning pre-trained language models (PLMs) achieves impressive performance on a range of downstream tasks, and their sizes have consequently been getting bigger. Since a different copy of the model is required for each task, this paradigm is infeasible for storage-constrained edge devices like mobile phones. In this paper, we propose SPARTAN, a parameter efficient (PE) and computationally fast architecture for edge devices that adds hierarchically organized sparse memory after each Transformer layer. SPARTAN freezes the PLM parameters and fine-tunes only its memory, thus significantly reducing storage costs by re-using the PLM backbone for different tasks. SPARTAN contains two levels of memory, with only a sparse subset of parents being chosen in the first level for each input, and children cells corresponding to those parents being used to compute an output representation. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

princeton-nlp/spartan
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Adam · Absolute Position Encodings · Softmax · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing