Efficient Inference via Universal LSH Kernel
Zichang Liu, Benjamin Coleman, Anshumali Shrivastava

TL;DR
This paper introduces the Representer Sketch, a mathematically provable method for approximating large model inference efficiently using hashing, achieving significant reductions in storage and computation without accuracy loss.
Contribution
The paper presents the Representer Sketch, a novel kernel-based sketching method that enables efficient inference for large models, surpassing existing techniques like quantization and pruning.
Findings
Up to 114x reduction in storage requirements.
Up to 59x reduction in computation complexity.
No accuracy drop observed.
Abstract
Large machine learning models achieve unprecedented performance on various tasks and have evolved as the go-to technique. However, deploying these compute and memory hungry models on resource constraint environments poses new challenges. In this work, we propose mathematically provable Representer Sketch, a concise set of count arrays that can approximate the inference procedure with simple hashing computations and aggregations. Representer Sketch builds upon the popular Representer Theorem from kernel literature, hence the name, providing a generic fundamental alternative to the problem of efficient inference that goes beyond the popular approach such as quantization, iterative pruning and knowledge distillation. A neural network function is transformed to its weighted kernel density representation, which can be very efficiently estimated with our sketching algorithm. Empirically, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Machine Learning and Data Classification
MethodsPruning
