Efficient model compression with Random Operation Access Specific Tile   (ROAST) hashing

Aditya Desai; Keren Zhou; Anshumali Shrivastava

arXiv:2207.10702·cs.LG·July 25, 2022

Efficient model compression with Random Operation Access Specific Tile (ROAST) hashing

Aditya Desai, Keren Zhou, Anshumali Shrivastava

PDF

Open Access 1 Repo

TL;DR

This paper introduces ROAST hashing, a cache-efficient, model-agnostic compression method that significantly reduces model size and training/inference time, enabling deployment of large models like BERT on edge devices without quality loss.

Contribution

ROAST hashing is a novel, cache-friendly model compression technique that outperforms existing methods and enables the first compressed BERT suitable for resource-constrained devices.

Findings

01

ROAST is up to 25x faster to train and 50x faster to infer than HashedNet.

02

ROAST achieves 100x to 1000x compression of BERT without quality degradation.

03

Global weight sharing in ROAST is empirically and theoretically superior to local sharing.

Abstract

Advancements in deep learning are often associated with increasing model sizes. The model size dramatically affects the deployment cost and latency of deep models. For instance, models like BERT cannot be deployed on edge devices and mobiles due to their sheer size. As a result, most advances in Deep Learning are yet to reach the edge. Model compression has sought much-deserved attention in literature across natural language processing, vision, and recommendation domains. This paper proposes a model-agnostic, cache-friendly model compression approach: Random Operation Access Specific Tile (ROAST) hashing. ROAST collapses the parameters by clubbing them through a lightweight mapping. Notably, while clubbing these parameters, ROAST utilizes cache hierarchies by aligning the memory access pattern with the parameter access pattern. ROAST is up to $\sim 25 \times$ faster to train and $\sim…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

apd10/RzLinear
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Algorithms and Data Compression

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Dropout · Multi-Head Attention · Residual Connection · Weight Decay · Layer Normalization · WordPiece · Softmax