Efficient model compression with Random Operation Access Specific Tile (ROAST) hashing
Aditya Desai, Keren Zhou, Anshumali Shrivastava

TL;DR
This paper introduces ROAST hashing, a cache-efficient, model-agnostic compression method that significantly reduces model size and training/inference time, enabling deployment of large models like BERT on edge devices without quality loss.
Contribution
ROAST hashing is a novel, cache-friendly model compression technique that outperforms existing methods and enables the first compressed BERT suitable for resource-constrained devices.
Findings
ROAST is up to 25x faster to train and 50x faster to infer than HashedNet.
ROAST achieves 100x to 1000x compression of BERT without quality degradation.
Global weight sharing in ROAST is empirically and theoretically superior to local sharing.
Abstract
Advancements in deep learning are often associated with increasing model sizes. The model size dramatically affects the deployment cost and latency of deep models. For instance, models like BERT cannot be deployed on edge devices and mobiles due to their sheer size. As a result, most advances in Deep Learning are yet to reach the edge. Model compression has sought much-deserved attention in literature across natural language processing, vision, and recommendation domains. This paper proposes a model-agnostic, cache-friendly model compression approach: Random Operation Access Specific Tile (ROAST) hashing. ROAST collapses the parameters by clubbing them through a lightweight mapping. Notably, while clubbing these parameters, ROAST utilizes cache hierarchies by aligning the memory access pattern with the parameter access pattern. ROAST is up to faster to train and $\sim…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Algorithms and Data Compression
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Dropout · Multi-Head Attention · Residual Connection · Weight Decay · Layer Normalization · WordPiece · Softmax
