Multi-stage Distillation Framework for Cross-Lingual Semantic Similarity   Matching

Kunbo Ding; Weijie Liu; Yuejian Fang; Zhe Zhao; Qi Ju; Xuefeng Yang

arXiv:2209.05869·cs.CL·September 14, 2022

Multi-stage Distillation Framework for Cross-Lingual Semantic Similarity Matching

Kunbo Ding, Weijie Liu, Yuejian Fang, Zhe Zhao, Qi Ju, Xuefeng Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-stage distillation framework that effectively compresses large cross-lingual models like XLM-R and MiniLM by over 50% with minimal performance loss, enabling deployment on memory-limited devices.

Contribution

It proposes a novel multi-stage distillation approach combining contrastive learning, bottleneck, and recurrent strategies to maintain high performance in small cross-lingual models.

Findings

01

Compressed models by over 50% in size.

02

Performance reduced by only about 1%.

03

Effective for deployment on memory-constrained devices.

Abstract

Previous studies have proved that cross-lingual knowledge distillation can significantly improve the performance of pre-trained models for cross-lingual similarity matching tasks. However, the student model needs to be large in this operation. Otherwise, its performance will drop sharply, thus making it impractical to be deployed to memory-limited devices. To address this issue, we delve into cross-lingual knowledge distillation and propose a multi-stage distillation framework for constructing a small-size but high-performance cross-lingual model. In our framework, contrastive learning, bottleneck, and parameter recurrent strategies are combined to prevent performance from being compromised during the compression process. The experimental results demonstrate that our method can compress the size of XLM-R and MiniLM by more than 50\%, while the performance is only reduced by about 1%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KB-Ding/Multi-stage-Distillaton-Framework
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsXLM-R · Knowledge Distillation