Extreme compression of sentence-transformer ranker models: faster   inference, longer battery life, and less storage on edge devices

Amit Chaulwar; Lukas Malik; Maciej Krajewski; Felix Reichel,; Leif-Nissen Lundb{\ae}k; Michael Huth; Bartlomiej Matejczyk

arXiv:2207.12852·cs.LG·July 27, 2022·1 cites

Extreme compression of sentence-transformer ranker models: faster inference, longer battery life, and less storage on edge devices

Amit Chaulwar, Lukas Malik, Maciej Krajewski, Felix Reichel,, Leif-Nissen Lundb{\ae}k, Michael Huth, Bartlomiej Matejczyk

PDF

Open Access

TL;DR

This paper introduces two novel extensions to sentence-transformer distillation—vocabulary size optimization and embedding dimensionality reduction—that enable extreme compression of ranker models, making them suitable for edge devices with limited resources.

Contribution

The paper proposes two extensions to existing distillation methods—vocabulary optimization and embedding reduction—that significantly improve model compression for edge deployment.

Findings

01

Compressed models achieve high accuracy on test datasets.

02

Extensions reduce memory and energy consumption substantially.

03

Models are suitable for deployment on resource-constrained devices.

Abstract

Modern search systems use several large ranker models with transformer architectures. These models require large computational resources and are not suitable for usage on devices with limited computational resources. Knowledge distillation is a popular compression technique that can reduce the resource needs of such models, where a large teacher model transfers knowledge to a small student model. To drastically reduce memory requirements and energy consumption, we propose two extensions for a popular sentence-transformer distillation procedure: generation of an optimal size vocabulary and dimensionality reduction of the embedding dimension of teachers prior to distillation. We evaluate these extensions on two different types of ranker models. This results in extremely compressed student models whose analysis on a test dataset shows the significance and utility of our proposed extensions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Topic Modeling

MethodsTest · Knowledge Distillation