Bootstrapping Transliteration with Constrained Discovery for   Low-Resource Languages

Shyam Upadhyay; Jordan Kodner; Dan Roth

arXiv:1809.07807·cs.CL·September 24, 2018·1 cites

Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages

Shyam Upadhyay, Jordan Kodner, Dan Roth

PDF

Open Access 1 Repo

TL;DR

This paper introduces a bootstrapping method using constrained discovery to improve transliteration generation for low-resource languages, requiring significantly fewer training examples than existing methods.

Contribution

The work presents a novel bootstrapping algorithm that enables effective transliteration generation with as few as 500 training examples, expanding applicability to low-resource languages.

Findings

01

Effective transliteration generation with limited data

02

Improved cross-lingual entity linking performance

03

Successful evaluation across nine diverse languages

Abstract

Generating the English transliteration of a name written in a foreign script is an important and challenging step in multilingual knowledge acquisition and information extraction. Existing approaches to transliteration generation require a large (>5000) number of training examples. This difficulty contrasts with transliteration discovery, a somewhat easier task that involves picking a plausible transliteration from a given list. In this work, we present a bootstrapping algorithm that uses constrained discovery to improve generation, and can be used with as few as 500 training examples, which we show can be sourced from annotators in a matter of hours. This opens the task to languages for which large number of training examples are unavailable. We evaluate transliteration generation performance itself, as well the improvement it brings to cross-lingual candidate generation for entity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shyamupa/hma-translit
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications