Emu: Enhancing Multilingual Sentence Embeddings with Semantic   Specialization

Wataru Hirota; Yoshihiko Suhara; Behzad Golshan; Wang-Chiew Tan

arXiv:1909.06731·cs.CL·November 26, 2019

Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

Wataru Hirota, Yoshihiko Suhara, Behzad Golshan, Wang-Chiew Tan

PDF

Open Access 1 Repo

TL;DR

Emu is a system that improves multilingual sentence embeddings by semantic fine-tuning, leading to better cross-lingual intent classification performance with monolingual data.

Contribution

It introduces a novel semantic specialization framework combining a semantic classifier and adversarial training to enhance multilingual sentence embeddings.

Findings

01

Outperforms state-of-the-art models on cross-lingual intent classification

02

Uses only monolingual labeled data for training

03

Enhances semantic similarity and multilinguality of embeddings

Abstract

We present Emu, a system that semantically enhances multilingual sentence embeddings. Our framework fine-tunes pre-trained multilingual sentence embeddings using two main components: a semantic classifier and a language discriminator. The semantic classifier improves the semantic similarity of related sentences, whereas the language discriminator enhances the multilinguality of the embeddings via multilingual adversarial training. Our experimental results based on several language pairs show that our specialized embeddings outperform the state-of-the-art multilingual sentence embedding model on the task of cross-lingual intent classification using only monolingual labeled data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

megagonlabs/emu
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification