Language Embeddings for Typology and Cross-lingual Transfer Learning

Dian Yu; Taiqi He; Kenji Sagae

arXiv:2106.02082·cs.CL·June 7, 2021·1 cites

Language Embeddings for Typology and Cross-lingual Transfer Learning

Dian Yu, Taiqi He, Kenji Sagae

PDF

Open Access 1 Repo

TL;DR

This paper investigates whether dense language embeddings, learned without parallel data, can effectively capture linguistic relationships and improve zero-shot cross-lingual tasks like parsing and inference.

Contribution

It introduces a method to generate language embeddings using a denoising autoencoder and evaluates their effectiveness in cross-lingual tasks without parallel data.

Findings

01

Embeddings capture linguistic relationships consistent with WALS

02

Effective in zero-shot cross-lingual dependency parsing

03

Improves cross-lingual natural language inference

Abstract

Cross-lingual language tasks typically require a substantial amount of annotated data or parallel translation data. We explore whether language representations that capture relationships among languages can be learned and subsequently leveraged in cross-lingual tasks without the use of parallel data. We generate dense embeddings for 29 languages using a denoising autoencoder, and evaluate the embeddings using the World Atlas of Language Structures (WALS) and two extrinsic tasks in a zero-shot setting: cross-lingual dependency parsing and cross-lingual natural language inference.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DianDYu/language_embeddings
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems