Extreme compression of sentence-transformer ranker models: faster inference, longer battery life, and less storage on edge devices
Amit Chaulwar, Lukas Malik, Maciej Krajewski, Felix Reichel,, Leif-Nissen Lundb{\ae}k, Michael Huth, Bartlomiej Matejczyk

TL;DR
This paper introduces two novel extensions to sentence-transformer distillation—vocabulary size optimization and embedding dimensionality reduction—that enable extreme compression of ranker models, making them suitable for edge devices with limited resources.
Contribution
The paper proposes two extensions to existing distillation methods—vocabulary optimization and embedding reduction—that significantly improve model compression for edge deployment.
Findings
Compressed models achieve high accuracy on test datasets.
Extensions reduce memory and energy consumption substantially.
Models are suitable for deployment on resource-constrained devices.
Abstract
Modern search systems use several large ranker models with transformer architectures. These models require large computational resources and are not suitable for usage on devices with limited computational resources. Knowledge distillation is a popular compression technique that can reduce the resource needs of such models, where a large teacher model transfers knowledge to a small student model. To drastically reduce memory requirements and energy consumption, we propose two extensions for a popular sentence-transformer distillation procedure: generation of an optimal size vocabulary and dimensionality reduction of the embedding dimension of teachers prior to distillation. We evaluate these extensions on two different types of ranker models. This results in extremely compressed student models whose analysis on a test dataset shows the significance and utility of our proposed extensions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Topic Modeling
MethodsTest · Knowledge Distillation
