TL;DR
This paper introduces Homomorphic Projective Distillation, a method to create compact sentence embeddings that retain high quality, significantly improving efficiency and performance in semantic retrieval and similarity tasks.
Contribution
The paper proposes a novel distillation technique that enables small models to mimic large pre-trained models, producing high-quality, compressed sentence representations.
Findings
Achieves 2.7-4.5 points improvement on STS tasks.
Enhances retrieval speed by 8.2 times.
Reduces memory usage by 8 times.
Abstract
How to learn highly compact yet effective sentence representation? Pre-trained language models have been effective in many NLP tasks. However, these models are often huge and produce large sentence embeddings. Moreover, there is a big performance gap between large and small models. In this paper, we propose Homomorphic Projective Distillation (HPD) to learn compressed sentence embeddings. Our method augments a small Transformer encoder model with learnable projection layers to produce compact representations while mimicking a large pre-trained language model to retain the sentence representation quality. We evaluate our method with different model sizes on both semantic textual similarity (STS) and semantic retrieval (SR) tasks. Experiments show that our method achieves 2.7-4.5 points performance gain on STS tasks compared with previous best representations of the same size. In SR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Byte Pair Encoding · Softmax · Dense Connections · Residual Connection · Dropout · Layer Normalization
