Language Embeddings for Typology and Cross-lingual Transfer Learning
Dian Yu, Taiqi He, Kenji Sagae

TL;DR
This paper investigates whether dense language embeddings, learned without parallel data, can effectively capture linguistic relationships and improve zero-shot cross-lingual tasks like parsing and inference.
Contribution
It introduces a method to generate language embeddings using a denoising autoencoder and evaluates their effectiveness in cross-lingual tasks without parallel data.
Findings
Embeddings capture linguistic relationships consistent with WALS
Effective in zero-shot cross-lingual dependency parsing
Improves cross-lingual natural language inference
Abstract
Cross-lingual language tasks typically require a substantial amount of annotated data or parallel translation data. We explore whether language representations that capture relationships among languages can be learned and subsequently leveraged in cross-lingual tasks without the use of parallel data. We generate dense embeddings for 29 languages using a denoising autoencoder, and evaluate the embeddings using the World Atlas of Language Structures (WALS) and two extrinsic tasks in a zero-shot setting: cross-lingual dependency parsing and cross-lingual natural language inference.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
